Psychometrika 


CONTENTS 


ESTIMATION OF THE CORRELATION COEFFICIENT IN THE 
CASE OF A BIVARIATE NORMAL POPULATION WHEN 
ONE OF THE VARIABLES IS DICHOTOMIZED 


J.S. Marirz 
A SIMPLE PROCEDURE FOR REARRANGING MATRICES 
W. A. GIBsoNn 


A TABLE FOR THE RAPID DETERMINATION OF THE TETRA- 
CHORIC CORRELATION COEFFICIENT 
Metvin D. Davinorr anD Howarp W. GoHEEN 


A SPECIAL REVIEW OF HAROLD GULLIKSEN, Theory of Mental 











Louis GUTTMAN 
COMMENTS ON GUTTMAN’S REVIEW OF Theory of Mental Tests 
HaAroLD GULLIKSEN 
A FACTOR-ANALYTIC STUDY OF REASONING ABILITIES 
RusseEu F. GREEN, J. P. GuitForD, PAut R. CHRISTENSEN, AND 
ANDREW L. CoMREY 
A METHOD FOR FACTORING LARGE NUMBERS OF ITEMS . 
X0BERT J. WHERRY AND BEN J. WINER 
PHILIP VERNON, The Structure of Human Abilities 
A Review by Joun W. FReNcH 
DOROTHY C. ADKINS AND SAMUEL B. LYERLY, Factor Analy- 
PY Pk kw eR ee Re 
A Review by Lyte V. JoNnEs 
LLOYD A. JEFFRESS, Cerebral Mechanisms in Behavior, The Hixon 
Symposium 
A Review by G. A. MILLER 
NORMAN FREDERIKSEN AND W. B. SCHRADER, Adjustment to 
College 
A Review by Epwarp S. Borpin 


BOOKS RECEIVED 








VOLUME EIGHTEEN JUNE 1953 NUMBER 2 














Nominees for the Council of Directors of the Psychometric Society 


Two new members of the Council of Directors of the Psychometric 
Society are to be elected at the regular annual meeting of the Society in 1953. 
The following persons have been nominated: 


Dr. Walter Deemer Dr. Richard Gaylord 
Dr. Paul Dwyer Dr. E. F. Lindquist 








PSYCHOMETRIKA—VOL. 18, NO. 2 
JUNE, 1953 


ESTIMATION OF THE CORRELATION COEFFICIENT IN THE 
CASE OF A BIVARIATE NORMAL POPULATION WHEN ONE OF 
THE VARIABLES IS DICHOTOMIZED* 


J. S. Maritrz 


SOUTH AFRICAN COUNCIL FOR SCIENTIFIC AND INDUSTRIAL RESEARCH 


It is shown that the problem of estimation of the correlation coefficient 
of a bivariate normal population when one of the variables is dichotomized 
may be attacked with “probit analysis’ methods. This represents an 
extension of the work of Gillman and Goode (3), as it was possible to find by 
this approach an approximation to the large-sample variance of the resulting 
estimate G of p. An empirical investigation was undertaken with the object. 
of obtaining some information about the distribution of G for large sample 
size. Methods for determining the “‘pass-fail” cut-off are considered. 


1. Introduction. It is often necessary to estimate the correlation 
coefficient between two variates when one of them is dichotomized. If it is 
reasonable to assume that the joint distribution of these variates is normal, 
the statistic 7,;, (biserial 7) is commonly used as an estimator of the parameter 
p. It is well-known that when the variate which is not truncated (which we 
we will call the “continuous” variate) is restricted in some way, then 7,;, is 
no longer a consistent estimate of p (4). Gillman and Goode (3) have suggested 
an alternative procedure resulting in the estimator G or p which appears 
to have none of the disadvantages of 7,;,. It has been pointed out by Sichel (4) 
that the weights used by these authors for fitting their regression line are 
not strictly the best and that it would add considerably to the usefulness of 
this method if something were known about the sampling distribution of G. 
It is our object to state some results which were the outcome of an investi- 


gation of these points. 


2. Case of No Restriction 

2.1. We will denote the “continuous” variate by x and the dichotomized 
variate by y. It will be assumed that there is no restriction on x and that 
the x distribution is normal with mean 0 and variance 1; also that 


ply | a) dy = Fez exp [—Hy — ba)*] dy. (1) 


*The author wishes to thank the South African Council for Scientific and Industrial 
Research for permission to publish this paper. The invaluable assistance of Mr. H. S. 
Sichel in the preparation of this manuscript is also gratefully acknowledged. 


97 








98 PSYCHOMETRIKA 


If the coefficient of correlation between x and y is p it follows immediately 
that the marginal y distribution is normal with mean 0 and variance 1/(1 — p’), 
and 

p b 

or or ET 
V1— 6 V1i+ 0°’ 
this being the result which Gillman and Goode have used. If an estimate 5 of 
b can be found it seems reasonable to take as estimate of p the quantity: 


(2) 





b= 


G = — a ° (3) 
V1+0 





We want to suggest a procedure for estimating b which differs slightly 
from that of Gillman and Goode and which is simply an application of the 
probit analysis technique (1) to this problem. 

2.2. Observations of which the y values are greater than or equal to c, 
say, will for convenience be called “successes.”’ It follows from (1) that the 
proportion of successes at a particular x value will be 


ae ss ; 
= . xp [—a(y — bx)"] dy, 
7 | Von exp [—3(y x) | dy (4) 
which we may rewrite as 
t= | erf (u) du, (5) 


“e=bz 


where 


a: Se ee 
erf (w) = eS exp [—3u']. 


If we now define a “probit,” Y, by the relation 


a [ erf (u) du, (6) 
v—V+5 
we have 
Y=5-—c+ be. (7) 


2.3. In practice we do not usually work with the proportion of successes 
at an exact z-level. The observations in a certain z-interval are grouped 
together, and we write 


r= | erf (a) de, 
“ e-—bri 
where z; is the proportion of successes in the zth z-interval and 2, is the 
mean of that interval. Evidently a grouping error is introduced, and this 
problem has been fully discussed by K. D. Tocher (5). However, the grouping 








J. S. MARITZ 99 


error is not very large and may be reduced by choosing a finer grouping so 
that in what follows we will disregard the grouping error, thus simplifying 


the discussion somewhat. 

2.4. If our z-scale is divided into k intervals, the proportion of the 
total population in the ith interval being P; , then the probability that in a 
sample of size N, n, observations occur in the first interval, nz in the second, 
and so on, is given by the multinomial distribution. If the number of “‘suc- 
cesses” in the 7th interval is m, (out of a possible n;), then we have 


p(m; | ni) = ih we (l — 7)". (8) 


m; 
Making the reasonable assumption that 
PM, , Me, ose, My | M5 «0+, Me) = P(m, | m) 

X p(m, | n.) X +++ K p(m | rn), (9) 
we have 
p(m,,..., m|N) = p(m |) XK +++ XK p(m | m) 


X p(n, «+> m | N) =e’, (10) 
say, from which follows 


N! (™) 
.-. wai > n; log P; + D> log am 


+ > m; log; + : (n; — m,) log (1 — 7). (11) 


If 6 is a parameter not contained in P; then 





L = log 


OL _ yr milpi — i) Om: 
00 are — m,) 00’ (12) 
where p; = m;/Nn; . 
The parameters which we want to estimate are [see equation (7)] 
a = 5 — cand B, so that, since 
On; 0 


i. eT ee « al 
a CT. + SS, 95 = Ti ert Y, + 5), 


we have the equations 


OL _ 7. ni(p; — 7) £(_V - 
da ee m(1 ie T,) erf ( Y; + 5) a 0, 

(13) 
OL a np; ox T;) , _ ae 


which may be used to find the estimates a and 6 of a and b. 








100 PSYCHOMETRIKA 


These equations are identical in form with the usual “‘probit’’ equations 
and may consequently be solved in exactly the same way. The mathematical 
details of the method of solution are given in Finney’s book (1) and will 
not be repeated here, but an example will be given in 2.8 to illustrate how 
the solution is carried out in practice. 

2.5. We may now obtain the large-sample variances and the covariance 
of the estimates a and 6 of a and b. 


pwd owen] [-a(88) ~ oft) 
a,b oe aL aL : 
cov (a,b) var (b) = (2) i x72) | 


It follows from (12) that 
aL a ~ 1a n(p; — 7,) (=:)/ nN; 
» 7 7 ( 

















da = | da” x1 — o,) da 1 — z,) 
ni(p, — (1 — ard | 
+ wi(l — 7)” ‘ 
giving 
(22) ott Sew, , (15) 
da t=] 
where 
vp, = eh (=¥e + OF 
ifs a (1 — m,;) 
Similarly, 
tr) o - , 
(22 — —N > P,w,2; ) 
(16) 
OL ) — 
(2 a) = —N Dy Pris ‘ 
We may derive, using (14), (15), and (16), the results 
‘ 1 
rar (t == ; 
wo Fee 8’ 
a >> Pyw,2x7 
var(e) == ae, 17 
© = xD Pm Dy Pw, — 8) oi 


— ¢ 


N > Pyww(a; — #)°’ 





cov (a,b) = 








J. S. MARITZ 101 


where 


ee > Pww.2; : 
nits > Pw; 


2.6. From equation (3) it follows that, approximately for large N, 





var (G) = re * (18) 


where var(b) is given by (17). 
If we consider the grouping of the z-variate to become finer we see that 


k 
z. Pyw,(2; Pe £)’ 


t=] 





tends to 
¥ [erf (c — bx)} ~\9 
[ef © Bex twit = oe — bal @ ~ OA 
where 
g(c — bx) = I erf (u) du, 
and 


a [erf (c — ba) 
[ =e#@ pea milter al” 

c [erf (c — bzx)]’ 
| €@ pestis ee Dol 


We may therefore say that, asymptotically, 


ee fn [erf (c — b2)]? en 
var (b) = N {f erf (x) ce ban oe — wel (x — 2) ax} » (19 


while similar expressions hold for var(a) and cov(a,b). Expressions (17) 
therefore give approximations to what may be called approximations to 
the true asymptotic variances and covariance of a and 6. 

In practice we must always use (17), and since we do not know any of 
the parameters, we must substitute in the expressions their estimates ob- 
tained from the sample. Thus we use, for example, 








rT= 











i 1 
var (b) = bles — ae (20) 


remembering that this expression gives a “grouped”’ estimate of var(b). The 
calculation of this quantity will be illustrated in our example of 2.8. 








102 PSYCHOMETRIKA 


2.7. An interesting comparison may be made between the sampling 
variances of the estimators G and r,;, of p for large samples. If we denote by 
6 the cut-off between ‘‘successes” and “failures” on the y-variate in ‘‘stan- 
dard measures,”’ i.e., 


c 


en eee. ent (21) 
V1+0 V1+D 





B 


then N var(b) [e.f. equation (19)] and N var(7;,) may be tabulated for 
various values of p and 8. We have calculated expression (19) approximately 
by grouping into very fine intervals and summing, and Table 1 gives N var(G) 
and N var(r,;.). 


TABLE 1 


N var (G) and N var (r,;,) for Various Values of p and B 








Values of p 


B 4 6 8 








N var(G) Nvar(rni,) Nvar(G@) Nvar(ri,) Nvar(G@) WN var(rpis) 





0 1.135 1.195 . 704 . 798 271 .376 
5 1.256 1.327 . 785 .906 311 . 452 
1.0 1.705 1.822 1.095 1.322 .438 757 
1.5 2.881 3.125 1.910 2.458 795 1.658 





Evidently G is a ‘‘more efficient”’ estimate of p than r,;, when the popula- 
tion from which we sample is not restricted. There are, however, other 
advantages attached to the use of this analysis and these will be mentioned 
in some of the following paragraphs. 

2.8. For our example we have a sample of 200 individuals drawn from 
a bivariate normal distribution, given that the z-variate has expected value 
zero. This example is typical of cases when some standarized test is compared 
with a “pass-fail” criterion. The data are given in Table 2. 

The procedure is to calculate, first, the proportion of “‘successes” in 
each ‘cell’? and then to transform these proportions (or percentages) to 
empirical probits with the help of tables (1). These probits are then plotted 
against the z-values and a first approximation to the regression line is drawn 
“by eye.’”’ From the provisional line the Y-values are read off; they are the 
ordinates on the line corresponding to the X-values. These Y-values and the 
percentages of ‘‘successes” are used to find the weighting coefficients (7;) and 
the ‘working’ probits. Tables are available from which these quantities 
may be read off quite easily (1, 2). 








J. S. MARITZ 103 


TABLE 2 


Illustrative Example 














Class Prop. 
Class Fre- No.of of Em- Provi- Work- 
Class Mean quen- Suc- Suc- pir- sional nw ing nwX 
Interval xX cy cesses cesses. ical  Probit Probit 
n m Probit Y y 
2.25 - 2.75 2.446 3 2 .67 5.44 6.16 1.15 5.14 2.81290 
1.75 - 2.25 1.961 7 5 wi 5.55 5.79 3.54 5.53 6.94194 
1.25- 1.75 1.468 15 11 a 5.61 5.40 9.01 5.60 13.22668 
75 - 1.25 .979 24 15 .62 5.381 5.04 15.27 5.30 14.94933 
.25 - 75 .490 35 11 31 4.50 4.67 21.41 4.51 10.49000 
— .25 - .25 .000 43 14 .33 4.56 4.25 22.25 4.56 
— .75 - — .25 — .490 27 2 .07 3.52 3.90 10.93 3.60 —5.35570 
—1.25 - — .75 — .979 28 2 .07 3.52 3.49 7.44 3.53 —7.28376 
—1.75 - —1.25 —1.468 9 1 oat 3.77 3.13 1.46 4.28 —2.14328 
—2.25 - —1.75 —1.961 4 0 0 —o 2.73 .82 2.35 — .62752 
—2.75 - —2.25 —2.446 5 0 0 —o 2.35 .18 2.01 — .44028 





s 


Totals 63 





The new regression line is found by fitting a “least-squares” straight 
line to the “working” probits with weights n,%;. For this purpose we require 


dnd = 92.96, 

> nbX = 32.5712, 
>> mby = 428.7571, 
> nbXy = 196.9369, 














> noX’? = 74.8951. 
We then have 
- ee ae (zx noX)( > ny) 

a (> nid) 

ie oa yr? (i moX) 

D> nox we 

_ 46.7097 

~ 63.4828 

= .7358, 


so that G = .5927. 








104 PSYCHOMETRIKA 


We also find 


a nit “ noX 

a 2antby —b 2 nik = 4.3545, 
YS ni D> no 

and we can plot the regression line 


Y = 4.3545 + .7358X, 


this line being a second approximation to the true estimate of the population 
regression line which this sample provides. The empirical probits and the 
two lines are plotted in Figure 1. The lines are quite close together and it 
appeared that a further ‘cycle’ (starting with the line calculated from the 
“eye” line) was not necessary. As a rough guide it may be assumed that no 
further cycle is required if the difference in slope between the two lines is 
less than five degrees, provided X is in standard scores (i.e., 8.D. = 1) and 
the Y’s are “probits” and when plotting a unit is represented by the same 
length on the X and Y scales. 


—_ Eye ‘Ft? 











XS 


Figure 1 


“Eye Fit” and Calculated Approximation to Regression Line 


We may now find estimates of var(b) and var(G). Using equation (20) 
we have 


se wpX)*) - 
est. var (b) = {x nox” — 2renh = .01575, 








J. S. MARITZ 105 
and from (18) 
1 ” 
t. G) = —— [est. b)] = .00430. 
est. var (@) a4 6) lest. var (b)] 


Thus 
est. S.E. (G) = .066, 


so that finally our estimate of p is 
.593 + .066. 


At this point there arises the question of the shape of the distribution 
of G for a sample size of two hundred. A theoretical investigation has not been 
carried out, but certain practical work (of which more in 2.10) has shown 
that the distribution of G may, quite reasonably, be assumed normal for N 


as large as 200. 
2.9. After calculating the regression line we may carry the analysis a 


little further. ; 
_ _ (a) If equation (1) holds true it may be shown that the quantity (where 
m,; is found from the regression line), 
. (m; a8 nim)” 

sat nm (1 sig m7) 
is distributed approximately as x” with k — 2 degrees of freedom if each 
n; is not small and N is large. This then provides a test of the assumption (1). 


TABLE 3 
Chi-Square Test of Assumption in Equation (1) 











(m — n#)? 
Y = 4.3545 + .7358 X t nt m n—-nt ——————— 
na(1 — #) 
6.15 8749 2.62 2) 
5.80 . 7881 fala. 5/18 — .14 .004 
5.43 6664 10.00 11} 
5.07 .5279 12.68 15 +2.32 .899 
4.72 3897 13.64 11 —2.64 .837 
4.35 .2578 11.09 14 +2.91 1.029 
3.99 .1562 4.22 2 —2.22 1.384 
3.63 0853 2.39 2 — .39 .070 
3.27 .0418 .38) 1 
2.91 .0183 7 49 Of1 +.51 .546 
2.55 .0071 .04 0 





Total 4.769 











106 PSYCHOMETRIKA 


For our example the calculation of x’ is shown in Table 3. It will be 
noticed that the ‘‘tails” are “grouped” to form cells with n, = 18 andn,; = 25. 
It is suggested that the smallest n; should be close to twenty. Since for this 
example we find x’ = 4.769 for 5 d.f., we may accept (1). 


(b) (i) We may obtain an estimate of the point of dichtomy (@) by 
referring to equation (21) and using the estimate 
A 5-—a 
B = Oa amare 2 
V1i+0° 


We may also find an estimate of 8 by the relation 
p= erf (u) du, 
vB 


where p = (Total number of successes in sample) / N. 
(ii) It is of some interest to examine the large-sample variances of these 
two estimates of 8. We have, for large N, 


var (8) = {var (a) + B’p’ var (b) + Bp cov (a,b)}, 


1 
(1 + 0’) 
var (8) = [erf (8)]-? var (p). 


The expressions for var(a), var(6), and cov(a,b) are given in equation (17), 
while var(p) is given by the well-known result, 
var (p) = PCP) 


Some values of N var(@) and N var(8) are given in Table 4. The values 
of N var(8) were calculated in the same way as were those of N var(G). 


TABLE 4 


N var(@) and N var(8) for Various Values of p and 6 








Values of p 











4 6 8 
B 
N var(B) N var(B) N var(B) N var(B) N var(B) N var(B) 
0 1.351 1.571 1.146 1.571 .925 1.571 
5 1.612 1.721 1.480 1.721 1.396 Live. 
1.0 2.417 2.280 2.668 2.280 3.375 2.280 
1.5 4.745 3.716 6.466 3.716 10.650 3.716 











J. S. MARITZ 107 


This table may be contrasted with Table 1. Whereas G is a “more 
efficient”’ estimate of p than r,;, for all values of p and 6 under consideration, 
we find that 6 is a “more efficient” estimate of 6 than 8 only for certain values 









































of p and £. 

2.10. An experiment was carried out in which 100 samples of 100 indi- 
viduals each were drawn from a normal bivariate distribution with p = .6. 
One variate was dichotomized with 8 = .5 and 100 G’s were calculated. 

35 
23 
19 
id 
¥ 
2 
/ / 
L aE 
w 

7 & #&€ &€ & Ss & Ee 

3s RF FF &© 8 FF FF HB BY FY 
FIGURE 2 


Distribution of 100 Samples of G (100 Cases in Each Sample) from Normal Population 
in Which p = .6 


Figure 2 shows a histogram of the observed distribution of 100 G’s. We also 
have the results shown in Table 5. 

In view of the good agreement between observed and expected figures 
in this example and the fact that the distribution of G (see Fig. 2) is not 
very skew, it seems reasonable to assume that for N = 200 and more the 
sampling distribution of G will be very nearly normal. 








108 PSYCHOMETRIKA 


TABLE 5 


Observed and Theoretical Values in 100 Samples from Normal Bivariate Population 
with p = .6 (100 Cases in Each Sample) 








Observed Theoretical 





Mean G .610 .600 
Mean r4j; .616 . 600 
Mean 8 .505 .500 
Mean 8 .506 .500 
var(G) .0084 .0079 
var(rpis) .0094 .0091 
var(8) .0168 .0134 
var(B) .0201 .0172 





3. The restricted normal surface. 


3.1. A problem which arises quite often in practice is that of estimating 
the correlation between two variates when we have only a “restricted” 
sample available. This happens, for example, when candidates are selected 
for a certain course of training by some “screening” device so that final 
criterion follow-up data are not available for a random sample from the total 
population. 

If the restriction occurs in such a way that equation (1) still holds 
although the marginal z-distribution may not be normal, we may still find 
an estimate of p in the total population by the procedure outlined in para- 
graphs 2.4 through 2.9. An approximation to the standard error of G may, 
as before, be found, using equations (20) and (18). 

3.2. We now give an example in which the z-distribution is truncated, 
that is, corresponding to a practical situation in which only those applicants 
are selected who score above a certain level in a test. A sample of 1000 











individuals was drawn from a normal bivariate distribution with p = .6, 
TABLE 6 
Sample from Normal Bivariate Population with p = .6, Truncated atz = —.258.D. 
xX n m p Probit Y nw y noX 
3.424 1 1 100.0 oo 6.92 AS WBS .51360 
2.935 2 2 100.0 oo 6.56 .50 7.06 1.46750 
2.446 9 7 77.8 5.76 6.19 3.36 5.65 8.21856 
1.961 30 24 80.0 5.84 5.84 14.71 5.84 28 .84631 
1.468 57 37 64.9 5.38 5.46 33.59 5.38 49.31012 
.979 136 76 55.9 5.15 5.09 86.33 5.15 84.51707 
.490 184 63 34.2 4.59 4.72 113.84 4.60 55.78160 
.000 189 51 27.0 4.39 4.35 103.02 4.39 .00000 





Totals 608 261 











J. S. MARITZ 109 


the y-variate being dichotomized at 8 = + .5. The x-distribution was then 
truncated at —.25 (in standard measures) resulting in the data given in 


Table 6. 
The “provisional” regression line and the calculated line are shown in 


aides Eye ‘tit? 








l 





3 i l 
fe) / 2 3 a 
> el 
FIGuRE 3 
“Eye Fit’’ and Calculated Regression Line for Example Where z is Truncated 


Figure 3. Using the Y-values from the provisional line and the corresponding 
working probits, the following sums were calculated: 

> ni 355.5000, 

Dd nbX 228.6548, 

Dd nioy 1710.7584, 

> nbXy = 1186.1795, 

>> nox? 265.1983. 
Using these sums, we find 

~ 85.8334 


————— = ,7266, 


~ 118.1294°— 


so that G = .5878. 








110 PSYCHOMETRIKA 


The calculated regression line, shown in Figure 3, is 
Y = 4.3450 + .7266X. 
We also have, approximately, 
S.E. (G) = .0487. 

We may test our assumptions in paragraph 3.1 by calculating x’ in the 
same way as was done in Table 3. For this example x” = 2.238 for 3 d.f. 
The estimate of 8 using equation (21) is B = .5299. 

4. Conclusion. In conclusion we want to emphasize the fact that G@ is 
a consistent estimate of p even when the sample is “restricted” as indicated 
in paragraph 3.1. It is of course well-known that in these circumstances p 
cannot be estimated by 7,;,, so that it would always seem to be advisable 
to use G rather than r,;, , because even when both are consistent estimates 
of p, G is the more efficient estimate. 


REFERENCES 


1. Finney, D. J. Probit analysis. Cambridge: Cambridge Univ. Press, 1947. 
2. Finney, D. J., and Stevens, W. L. Table for the calculation of working probits and 


weights in probit analysis. Biometrika, 1948, 35, 191-201. 

3. Gillman, L., and Goode, H. H. An estimate of the correlation coefficient of a bi- 
variate normal population when X is truncated and Y is dichotomized. Harv. educ. 
Rev., 1946, 16, 52-55. 

4. Sichel, H. S. First peace-time validation of army selection tests with a discussion of 

some statistical problems encountered in this project. Bulletin of the National Institute 

for Personnel Research of the South African Council for Scientific and Industrial 

Research, 1950, 2, 4-35. 

Tocher, K. D. A note on the analysis of grouped probit data. Biometrika, 1949, 36, 


9-17. 


Manuscript received 6/9/52 


Revised manuscript received 7/31/52 








PSYCHOMETRIKA—VOL. 18, No. 2 
JUNE, 1953 


A SIMPLE PROCEDURE FOR REARRANGING MATRICES* 


W. A. GIBson 


UNIVERSITY OF NORTH CAROLINA 


Guttman’s scalogram board technique for reordering the columns and 
rows of a matrix is described and its disadvantages are pointed out. A simple 
and inexpensive procedure for doing the same job without these disadvantages 
is outlined. 


There are a number of problems in psychometrics for whose solution it would 
be helpful to have an easy and inexpensive method for altering the order 
of the rows and columns of a matrix. One example of this kind of problem 
is the attempt to find a Spearman hierachy in a correlation matrix. Another 
is cluster analysis or the selection of highly inter-related subgroups of tests 
for such factoring procedures as the grouping or multiple-group methods; 
Perhaps the most notable example is Guttman’s scalogram analysis.f In fact, 
this paper might well have been entitled ““A Cheap Scalogram Board.” 

In scalogram analysis the matrix to be rearranged is the score matrix, 
in which the rows represent the members of the experimental sample, while 
the column headings are the response categories for questionnaire items. 
In any row of this matrix, the only cells which are filled are those correspond- 
ing to the response categories which the individual involved has endorsed. 
These entries are 1’s or X’s or check marks, while all other cells are regarded 
as containing zeros. The task of scalogram analysis is first to reorder the 
rows of this matrix and then to shuffle the columns in such a way as to come 
out with the closest approximation to a parallelogram pattern of the non- 
zero entries. To accomplish this, Guttman has invented the scalogram 
board, a device consisting of a rack which holds a hundred narrow strips 
of wood, each strip representing a row of the score matrix. There are a 
hundred recesses drilled into each strip to represent one hundred cells in 
that row of the matrix. The response pattern for any person is indicated 
by placing buckshot or small ball bearings in the recesses corresponding to 
the response categories he checks. When this is done for every person, 
the rows of the score matrix can be reordered to suit the investigator. When 
the time comes to interchange columns, a second board, identical with the 

*I am grateful to Professor Jozef Cohen of the University of Illinois for a five-minute 
conversation which greatly simplified the procedure described here. 

tSuchman, Edward A. The scalogram board technique for scale analysis. In 


poral Stouffer, et al., Measurement and prediction. Princeton: Princeton Univ. Press 
1950. Ch. 4. 


111 








112 PSYCHOMETRIKA 


first, is placed upside down on top of the first board, with its wood strips 
at right angles to those of the first board. The two boards are then held 
together in that relationship and turned over, so that the balls in the first 
board fall into the corresponding recesses in the second board. The first 
board (which is now on top and completely empty of balls) is then removed, 
and the investigator can proceed to rearrange the columns of the score 
matrix, for the movable strips in the second board now represent those 
columns. 

Perhaps the main disadvantage of the scalogram board is its prohibitive 
cost, which is greatly increased by the precision of manufacture that is 
necessary in order that all balls from the first board will fall freely into the 
second when the two are turned over. Great uniformity is thus required 
in the spacing of the recesses and in the widths of the strips. Cost estimates 
run into the hundreds of dollars. .A second disadvantage that might be 
mentioned is the time required to place the balls in their proper positions 
in the board. Even the dropper which has been designed for this work* 
will not be nearly so fast as the placing of check marks in the proper cells 
of a data sheet. A third drawback of this method for rearranging matrices 
is that it is applicable to tables of qualitative data only. That is, the score 
matrix may show only presence or absence of an attribute (indicated by 
presence or absence of a ball in a recess) and cannot reflect quantitative 
differences such as are usually present in mental test score matrices and in 
correlation tables. This drawback is of no consequence for scalogram 
analysis itself, which deals exclusively with qualitative data, but it prevents 
the use of scalogram boards in reordering many other types of matrices. 
One possible way to overcome this defect would be to employ beads of different 
colors in place of the metal balls, but there would still be a limit to the number 
of differentiable colors that could be used. A fourth restriction is that the 
capacity of the board, in terms of the maximum number of columns and 
rows it can represent, is fixed once the board has been constructed. 

Let us now take up the description of a simple procedure which will 
overcome the disadvantages of the scalogram board that have been mentioned, 
while at the same time introducing no serious new drawbacks of its own. 
The matrix to be rearranged is first recorded on an ordinary data sheet. 
To avoid certain difficulties later on, it may prove desirable to utilize only 
every other column and row in this recording process. The completed 
data sheet is taken to an ordinary paper cutter and is cut into strips in such 
a way that each strip is a row of the original matrix. If the narrow strips 
of the data paper tend to twist or curl either immediately or after considerable 
handling, the sheet can be fastened, before cutting, to a piece of stiff card- 
board by strips of masking tape applied to its right and left edges. Great 
uniformity in the cutting process is unnecessary. The resulting strips can 


*Ibid., p. 96. 








W. A. GIBSON 113 


be manipulated just as are the wood strips of the scalogram board. They 
can even be turned over or dropped on the floor without the numbers or 
check marks falling off. 

When the investigator is satisfied with his new ordering of the rows and 
is ready to shuffle the columns, he can align the row strips on a drawing board 
with any desired degree of care and fasten them all down, again using two 
strips of masking tape at the sides, or by some other means. A graduate 
assistant is then entrusted with the mission of carrying this material to the 
nearest photostating laboratory, where a photostatic negative is made. A 
positive is unnecessary. The negative is returned to the paper cutter and 
chopped into strips corresponding to the columns of the matrix. These 
strips can be manipulated freely to achieve a new ordering of the columns, 
and a second trip to the photostating office (or mere copying off) gives the 
reordered matrix in one piece. The second negative (a double negative 
yields a positive) could of course be cut into row strips if further reordering 
of the rows were indicated, and any number of additional cycles of this 
kind could easily be undertaken. 

The estimated total cost of a venture of this kind is in the neighborhood 
of a dollar or two, and if funds run out at any stage the photostating process 
can be replaced by manual transcription, which is especially simple for 
qualitative data. Very possibly some interesting variants of the procedure 
outlined here will occur to the reader.* 

*Mrs. Eleanore Narevsky, psychology department secretary at the University of 
Illinois, where this paper was drafted, has suggested that the hectographing process might 
very well be adapted to this task, and one of the reviewers suggests the use of Eastman dry 
mounting tissue and a sheet of cardboard to amalgamate the reordered row strips into a 


single sheet which can then be cut into column strips, thus eliminating the need for photo- 
stating. 


Manuscript received 3/26/52 


Revised manuscript received 5/8/52 








PSYCHOMETRIKA—VOL. 18, No. 2 
JUNE, 1953 


A TABLE FOR THE RAPID DETERMINATION OF THE 
TETRACHORIC CORRELATION COEFFICIENT* 


Metvin D. Davivorr AND Howarp W. GoHEEN 


U. 8. CIVIL SERVICE COMMISSION 


A table is developed and presented to facilitate the computation of the 
Pearson Q; (‘‘cosine method’’) estimate of the tetrachoric correlation coeffi- 
cient. Data are presented concerning the accuracy of Q; as an estimate of the 
tetrachoric correlation coefficient, and it is compared with the results ob- 
tainable from the Chesire, Saffir, and Thurstone tables for the same four-fold 
frequency tables. 


Introduction 


The tetrachoric correlation coefficient has been extensively employed 
since its introduction by Pearson (6) in 1901. Its adaptability to the easy 
handling of certain kinds of data has made it a popular technique. The 
tremendous amount of computational labor involved in handling the original 
formula has motivated many persons to attack the problem of simplifying 
the computational process. Pearson himself (6) derived several estimates 
of the tetrachoric, one of which, Q; , is the particular concern of this paper. 
This estimate was chosen because of its general adequacy and more partic- 
ularly for its adaptability to the mathematical manipulations involved. 
In slightly modified form, Q; has been frequently employed as the “cosine 
method.”’ Chesire, Saffir, and Thurstone (1) collaborated in the development 
of computing tables for the tetrachoric coefficient. Hamilton (3) presented 
a nomogram based on the cosine method. Hayes (4) worked out tables 
based on percentage differences, for which Goheen and Kavruck (2) offered 
simplified work sheets. Jenkins (5) offered graphical methods for rapid 
determination of the tetrachoric. 

The authors some years ago saw a copyrighted and unpublished graph 
for rte. by Commander A. P. Webster, a Navy biologist. This graph was 
entered with the parameter of the ratio of cross-products of the fourfold 
table diagonal cells. The one graph was used for all cuts. The authors have 
seen no derivation of this graph and had no idea of its theoretical origin. 
The usefulness of such a parameter and the idea of employing only one 
table instead of the large number involved in the Chesire, Saffir, and Thur- 
stone method interested us. We therefore set out to derive an estimate of 

*The authors are indebted to Mr. John Scott, Chief of the Test Development Section 


of the U.S. Civil Service Commission, for his encouragement and to Miss Elaine Ambrifi and 
Mrs. Elaine Nixon for the large amount of computational work involved in this paper. 


115 








116 PSYCHOMETRIKA 


Tree from the parameter mentioned above or any parameter not involving 
the position of the cuts on the variables. The r,., values obtained are appar- 
ently identical to those obtainable from the Webster graph, and the inference 
we make is that Webster too worked from the Q, estimate of Pearson.* 

Pearson (6) made a brief assessment of Q; . He ran 15 trials of the tetra- 
choric value with that given by his various estimates. Q; had an average 
absolute discrepancy of .021 from the actual tetrachoric value.t Further 
manipulation of the cosine variation of Pearson’s empirical formula has 
enabled the present authors to develop this table which yields the actual cosine 
method of Pearson Q; values. 


Derivation 
With the cells of the fourfold table labeled as follows, 
| 
| 
b | a 


d | c 
Pearson’s empirical formula for estimating the tetrachoric r was stated in 
the form _ _ 
Q, = sin (z . ig Me), (1) 
2 Vad 4. V be 
Since the sine of an angle is equal to the cosine of its complement, 
Cos (z — A ° Vad — Vb oi ee) 
2 2 Vad+ Vbe 
sl - ves) 
2 Vad + Vbe 
Ff ( 


2V be )| 


Q: 


] 


Il 


COs 
co 


Vad + Vobe 
mV bc (2) 


Vad +) V be 


This is the form which has been frequently employed under the name 





“cosine method.”’ 
The table presented in this article gives the actual Q; or cosine method 


value for each 7,., from .00 to 1.00. As has been indicated, one needs to 


*This inference has been confirmed in recent correspondence with Commander 


Webster. : 
+In the trial run every value of Q; was higher than the corresponding tetrachoric 


coefficient. 








MELVIN D. DAVIDOFF AND HOWARD W. GOHEEN 117 


enter the table with only the value of ad/be (or its reciprocal if it is larger), 

thus facilitating greatly the determination of the 7,., from a basic fourfold 

table. This is achieved in the following manner: a 
Divide numerator and denominator of the angle of (2) by V be: 





To change from radian measure to degrees, multiply the angle by 180/z: 


Q; = cos ae (3) 
_* 


For ease in construction of the table, the following transformation was made: 





= = arc cos Q; ? 
ac 
be hil 
- ae 
are cos Qs; 7 Vibe 
_ ee. y _ ad 
(— cos Q; ae be i 


Equation (4) was used in constructing the attached table. 

To use the table, set the data up in the fourfold table as indicated 
earlier. Enter the table with the value ad/bc or its reciprocal (whichever is 
the larger) and read its corresponding r,., value. If the table is entered 
with the reciprocal, the sign of the resultant r,., will be negative. Since 
the accuracy of the values given for r,., does not extend beyond the second 
decimal, interpolation between the values listed for ad/bc is not recommended. 


The Accuracy of Q; as an Estimate of re: 


In this section we compare the actual r,., , the value found in the Ches- 
ire, Saffir, and Thurstone diagrams (ry), and Q;. The procedure used in 
this checking is as follows: 

Various marginal totals are assumed as indicated below and various 
actual r,., values are also assumed. The marginal totals and assumed rye, 
have been put into the Pearson formula for the actual r..; 


Qn(ad — be) reuse 2 Phe, AM = Yk — 1) 
_ ial ae ae 6 





of 2 ae ms eee eee 
+ 5g hk[h? — 3(K? — B] tere tere, 





118 PSYCHOMETRIKA 


TABLE 1 


Pearson’s Q; Estimates of rte for Various Values of ad/bc 




















Teot ad/be Trot ad/be Ttet ad/be 
.00 0-1.00 | 35 2.49-2.55 .70 8.50-8.90 
01 1.01-1.03 .36 2.56-2.63 GA 8.91-9.35 
.02 1.04-1.06 | .37 2.64-2.71 72 9.36-9.82 
.03 1.07-1.08 .38 2.72-2.79 73 9.83-10.33 
04 1.09-1.11 39 2.80-2.87 74 10.34-10.90 
05 1.12-1.14 .40 2.88-2.96 75 10.91-11.51 
.06 1.15-1.17 .41 2.97-3.05 .76 11.52-12.16 
07 1.18-1.20 42 3.06-3.14 ai 12.17-12.89 
.08 1.21-1.23 43 3.15-3.24 .78 12.90-13.70 
.09 1.24-1.27 44 3.25-3.34 79 13.71-14.58 
.10 1.28-1.30 45 3.35-3.45 .80 14.59-15.57 
11 1.31-1.33 46 3.46-3.56 81 15.58-16.65 
12 1.34-1.37 AT 3.57-3.68 82 16.66-17.88 
.13 1.38-1.40 .48 3.69-3.80 .83 17.89-19.28 
.14 1.41-1.44 .49 3.81-3.92 84 19. 29-20.85 
15 1.45-1.48 .50 3.93-4.06 85 20. 86-22 .68 
.16 1.49-1.52 51 4.07-4.20 86 22.69-24.76 
17 1.53-1.56 52 4.21-4.34 .87 24.77-27 .22 
.18 1.57-1.60 53 4.35-4.49 .88 27 .23-30.09 
.19 1.61-1.64 54 4.50-4.66 .89 30. 10-33.60 
.20 1.65-1.69 55 4.67-4.82 .90 33.61-37.79 
.21 1.70-1.73 56 4.83-4.99 91 37 . 80-43 .06 
22 1.74-1.78 57 5.00-5.18 .92 43 .07-49.83 
.23 1.79-1.83 .58 5.19-5.38 93 49 .84-58.79 
24 1.84-1.88 59 5.39-5.59 94 58.80-70.95 
25 1.89-1.93 .60 5.60-5.80 95 70.96-89.01 
.26 1.94-1.98 61 5.81-6.03 96 89.02-117.54 
a7 1.99-2.04 | .62 6.04-6. 28 .97 117.55-169.67 
: .28 2.05-2.10 | .63 6.29-6.54 .98 169. 68-293. 12 
29 2.11-2.15 | .64 6.55-6.81 .99  293.13-923.97 
.30 2.16-2.22 65 6.82-7.10 1.00 923.98 — 
31 2.23-2.28 | 66 7.11-7.42 
.32 2.29-2.34 | 67 7.43-7.75 
.33 2.35-2.41 68 7.76-8.11 
34 2.42-2.48 69 8.12-8.49 














MELVIN D. DAVIDOFF AND HOWARD W. GOHEEN 119 


In actual use the left side of the equation was reduced as in Peters 
and Van Voorhis (7, 369) to “oot Use of proportions in the cells of 
the fourfold table eliminates the N’ in the denominator. (ad — bc) is then 
found from the equation. All the cell frequencies (in proportions) of the 
fourfold table can then be found from our knowledge of (ad — bc), (a + b), 
and (a + c). Using these cell frequencies we were able to obtain the correspond- 
ing values of ry, and Q; . 

The marginal totals listed in the left-hand column of Table 2 give unique 
values of the tetrachoric. Those listed in the right-hand column yield r,,.,’s 
identical with the corresponding marginal totals in the left-hand column. 
They come about merely as a result of one of the following conditions: 











a. column reflection, 
b. row reflection, 
c. row and column reflection, 
d. interchange of (a + b) and (a + c). 
TABLE 2 
Marginal values Marginal values yielding tetrachorics 
studied (rtet’S) identical with the co:respond- 
ing set of values on the left 
(a+b) (a+ec) 
2 BIRT? Te eiceeN Brtt ix apes es ee ot Pau SS te .8, .8 
2 Bly “SERRE eR ae ess ade s .8, .3 3, .8 3, 22 
Serer jo’ if; 28 
S aee | 
2 Bic dasdatiee ton aves 8, .5 5, .8 er 
3 Si eS Rie Geto wine aa viets. cece dare ty Se ORY te Re 
3 SME PE SY SERS So RCO eee Pe 5, .7 ae 





Originally Table 3 was set up on the basis of three terms of the series. 
Subsequently it occurred to us to check what changes would ensue if we used 
four terms, and the operation was repeated. Among the values in the present 
table, based on four terms of the series, 8 changes occurred. In all cases these 
changes were due to very small changes (.01 or .02) in the values of a, b, c, d. 
These very small changes, due in most instances to rounding in the second 
decimal place, caused some fluctuation in both Qs; and ry, about r,.. . Obvi- 





120 PSYCHOMETRIKA 


TABLE 3* 








wa er oe sag ee jas 
Tret=-2 Tet=-3 TMer=.5 Trer=.7 Trt=-8 Tter=.9 


at+b a+c a b c d a 
Tr Qs ro Qs rr Qs ron Qs ron Qs rm Qs 








06 .14 .14 .66 .23 .27 


Of 3438. 13: 67 .34 .38 
ole 08 AT SAL 269 .53 .57 
.11 .09 .09 .71 .68 .72 
13 .07 .07 .73 -82 .84 
.15 .05 .05 .75 .92 .92 
08 .12 .22 .58 .19 .22 
.09 .11 .21 .59 .380 .32 
2 3 .12 .08 .18 .62 .54 .57 
.14 .06 .16 .64 .68 .71 
.16 .04 .14 .66 81 .83 
.18 .02 .12 .68 91 .93 
12 .08 .38 .42 .17 .20 
.13 .07 .37 .43 .28 .30 
2 .5 .16 .04 .34 .46 .53 .59 
.18 .02 .32 .48 72 .78 
.19 .01 .31 .49 282 .88 
.20 0 .30 .50 —_ —- 
12 .18 .18 .52 .24 .25 
WS AT civ 3638 .31 .33 
3 .0 .16 .14 214 66 -52 .54 
39 .11 .21 <39 a0 3a 
.21 .09 .09 .61 80 .81 
.23 .07 .07 .63 .87 .88 
.18 .12 .32 .88 .21 .22 
.19 .11 .31 .39 .30 .30 
.8 .5 .22 .08 .28 .42 49 .51 
.25 .05 .25 .45 sO) .41 
.27 .03 .23 .47 -80 .84 
.29 .01 .21 .49 s00. DA 
.28 .22 .22 .28 .18 .19 
.30 .20 .20 .30 .31 .31 
Bp Ao Cee wid 587 e8 48 .48 
.37 .13 .13 .37 -68 .68 
09 .11 .11 .39 Sa ott 
.41 .09 .09 .41 .84 .84 





*rtet was always assumed to be positive. The accuracy checks are, however, perfectly generalizable 
to negative values. 




















eG 


MELVIN D. DAVIDOFF AND HOWARD W. GOHEEN 121 


ously, however, these changes were quite unreliable (because of the small 
frequency changes causing them). No changes were noted that would cause 
any change in the judgment of the level of accuracy of the tetrachoric esti- 
mates involved. The value in the table indicated by a dash (—) came about 
because of a zero cell frequency. The actual issue in this table, however, is 
the comparison of Q3 and ry, . 

Q; and r, are generally in close agreement. When they differ, Q; always 
seems to be greater than ry, . As would be expected, the best agreement 
seems to be at a .5, .5 split. 


REFERENCES 


1. Chesire, L., Saffir, M., and Thurstone, L. L. Computing diagrams for the tetrachoric 
correlation coefficient. Chicago: Univ. of Chicago Bookstore, 1933. 

2. Goheen, H. W., and Kavruck, S. A worksheet for tetrachoric r and standard error of 
tetrachoric r using Hayes diagrams and tables. Psychometrika, 1948, 13, 279-280. 

3. Hamilton, M. Nomogram for the tetrachoric correlation coefficient. Psychometrika, 1948, 
13, 259-269. 

4. Hayes, S. P., Jr. Diagrams for computing tetrachoric correlation coefficients from per- 
centage differences. Psychometrika, 1946, 11, 163-172. 

5. Jenkins, W. L. A single chart for tetrachoric r. Educ. psychol. Meas., 1950, 10, 142-144. 

6. Pearson, K. Mathematical contribution to the theory of evolution, VII. On the correla- 
tion of characters not quantitatively measurable. London: Philos. Trans. roy. Soc., 
195A, 1901. 

7. Peters, C. C., and Van Voorhis, W. R. Statistical procedures and their mathematical 
bases. New York: McGraw-Hill, 1941. 


Manuscript received 7/18/52 


Revised manuscript received 11/2/52 








PSYCHOMETRIKA—VOL. 18, No. 2 
JUNE, 1953 


A SPECIAL REVIEW OF 
HAROLD GULLIKSEN, THEORY OF MENTAL TESTS* 


Louis GuTTMAN 


THE ISRAEL INSTITUTE OF APPLIED SOCIAL RESEARCH 


The most recent effort to integrate the sprawling statistical literature in 
mental testing is Theory of Mental Tests, by Harold Gulliksen. First the 
coverage of the book will be sketched, then treatment of the topics analyzed. 
Of the twenty main chapters, this reviewer classified twelve as being devoted 
primarily to reliability, four primarily to validity, three to scoring techniques, 
and one to item analysis. 

Reliability theory is introduced by adaptations of earlier algebraic treat- 
ments, and various conventional formulas ensue. Practical measures for mak- 
ing the needed observations are described and criticized. The reliability of 
speeded tests was first studied mathematically by Gulliksen himself, and his 
approach is described in one of the chapters. 

Validity is treated from the points of view of test length and group 
heterogeneity. Unusual attention is given to the case of selected populations 
and to estimated variances and/or covariances for the entire population. Dis- 
tinction is made between “explicit” and “incidental” selection; lack of this 
care has caused mistakes in previous literature. 

A theoretical framework is least apparent in the last four chapters, on 
scoring and on item analysis. This is the state of the extant literature: There 
has been virtually no attempt at a coherent theory for these topics. Item 
analysis is summed up as follows: “The striking characteristic of nearly all the 
methods described [by earlier authors] is that no theory is presented showing 
the relationship between the validity or reliability of the total test and the 
method of item analysis suggested”’ (p. 363). Gulliksen’s treatment is intended 
to pave the way to fill this lack. 

Our basic criticisms of the book can be summarized in seven major points: 

(a) The theory of reliability is based on the notion of “parallel’’ tests. 
This notion does not lead to a unique definition of the reliability of any given 
test and hence cannot serve as the basis for a universal theory of reliability. 

(b) No distinction is made between the algebraic consequences of dif- 
ferent concepts of reliability. This creates inconsistencies between and within 
the concepts and the algebra presented. 


*New York: John Wiley & Sons, 1950; xix + 486 pp., $6.00. 
123 








124 PSYCHOMETRIKA 


(c) Retest theory* defines the most universal kind of error, and all other 
theories introduce additional, variously specialized, notions of deviation. 
Hence retest coefficients are upper bounds to all other types of reliability 
coefficients. The book perpetuates the interpretation that retest coefficients 
are “spuriously” large, instead of pointing out that this larger size must 
theoretically hold if all hypotheses are satisfied. 

(d) Only one full-fledged excursion is made into modern statistical theory 
—in connection with Wilks’ statistical test of parallelism of alternate forms. 
There is an incomplete excursion with respect to analysis of variance in 
Chapter 5; otherwise, old algebraic formulations are retained, with resulting 
inconsistencies in formulas. Most of the practical sampling problems of 
reliability and validity are not mentioned. 

(e) In general, reliability and validity are discussed in terms of test 
length. This implies that only a single universe of content is being studied 
and one which has a certain kind of structure. But most prediction problems 
in testing involve several universes of content and more complex structures. 

(f) Whereas an exact multivariate analysis is presented for the param- 
eters of multivariate selection, only bivariate techniques are advocated for 
item analysis for weighting problems that equally require a multivariate 
treatment. 

(g) The basic data of most mental tests are qualitative; yet no treatment 
is given of the theory of such qualitative data. Instead, an attempt is made to 
adapt to qualitative items least-squares theory appropriate to quantitative 
items. 

Our analysis will be divided into two main parts, one on the problem of 
reliability and one on validity, scoring, and item analysis. 


Reliability Theory 
1. The Formulation in Terms of Parallel Tests 


A general problem in a testing program is to avoid having the test ques- 
tions leak out in advance. One solution is to prepare two or three forms having 
the same content, so that groups tested on different days will get different 
forms. It is of interest, then, to know to what extent the various forms are 


comparable. 


*We shall mean by this what Cronbach calls “hypothetical retest with zero time.”’ 
If a test is actually repeated twice on the same population, then each trial has its own 
retest coefficient, since the situation may change between trials. The kind of coefficient we 
call here “retest”? implies no change in situation. ““No change” can always be guaranteed 
by making but one empirical trial under the conditions of interest; then one can use a 
formula for computing a lower bound to the retest coefficient. Lower bound formulas give 
correct information about what would happen in an infinite number of trials under un- 
changed conditions, and fortunately require but a single empirical trial for calculating their 
statistics (3). 








LOUIS GUTTMAN 125 


This problem has been rephrased by many writers in an attempt to 
construct a theory of reliability. The book before us gives what seems to be 
the most coherent of such approaches to reliability. 

With respect to the book’s definition of parallelism, it can be agreed that 
sets of tests can be constructed and people can be found such that the equa- 
tions of Chapter 3 can be satisfied. Tests are parallel if they have common 
means, variances, and intercorrelation coefficients. It is not so easy to see, 
however, that the definition is unique. It seems to this reviewer that one could 
find the same test to belong to more than one set of parallel tests and thus in 
general to have more than one “reliability coefficient.”’ 

Consider the following example of a series of ‘parallel’ tests. Let test 1 
consist of but a single item: “Write down all the words you can think of that 
begin with the letter ¢.’’ For a given population, and a given time limit, the 
score for each person is the number of words he writes down beginning with ¢. 

There are at least two different directions in which one could go to con- 
struct tests parallel to this one. One direction is to vary the letter involved. For 
example, test 2 could be: ‘Write down all the words you can think of that 
begin with p,” while test 3 could use instead the letter d, say. By adjusting 
the time limits, all three tests can be made to have the same mean. There 
seems no absolute barrier to their also having common variances and corre- 
lation coefficients. For our particular population, let us suppose the three 
tests are actually parallel, and that their common correlation coefficient is .70. 
Then, according to the book’s theory, test 1 has reliability coefficient .70. 

Another direction in which we could have gone to construct tests parallel 
to test 1 is to vary the places of the letter, and not the letter itself. Thus, test 2 
could be: “‘Write down all the words you can think of in which the second letter 
is t,” and test 3 could ask for ¢ as the third letter. Again, for our population, 
there is no physical bar to the tests turning out to be parallel. But this time, 
let us assume that the mutual intercorrelations turn out to be equal to .60. 
Then test 1 has reliability .60. : 

Therefore, test 1 has reliabilities .70 and .60 simultaneously, according 
to the theory of parallelism. It should be noted, too, that Gulliksen’s nonalge- 
braic requirements are also satisfied simultaneously by each series. These 
additional requirements are “the tests should contain items dealing with the 
same subject matter, items of the same format, ete.” (pp. 173f). 


2. The Relationship to Retest Reliability Theory 


Even were the book’s approach to yield a unique coefficient for some 
phenomena, this coefficient in general would differ from that ensuing from the 
test-retest theory. We can ask: What would happen if we were to repeat two 
parallel tests on the same population under the same conditions with no 
memory factor involved? Thus, each form is to have an experimentally inde- 
pendent retest. 








126 PSYCHOMETRIKA 


Gulliksen’s proposed coefficient will here be called the “communality 
coefficient,’”’ since its hypotheses are a specialized version of Spearman’s for 
his single-common-factor theory. Both parallel tests have (by definition) the 
same communality coefficient, and this is in general less than each of the retest 
coefficients (regardless of whether the latter are mutually equal or not). This 
is a direct consequence of the formulas of retest theory (3), and has been long 
known in common-factor theory. 

We are interested in the reliability of a test, because it gives information 
as to the limitations the test may have in predicting other variables and yields 
information as to how well the test can possibly be predicted from other 
variables. The retest coefficient gives precisely these types of information, 
assuming only experimental independence between criterion and predictor, 
and only the retest approach has this generality. 


3. Relationship to the Analysis of Variance 


Consider the problem of experimental error in the analysis of variance. 
Hoyt (4) and others have shown that if the parallelism requirements of equal- 
ity of means, variances, and covariances are satisfied, then the resulting vari- 
ance of errors of unreliability is precisely the residual or experimental error 
in the sense of analysis of variance. Indeed, the parallelism equations provide 
the special case where both common-factor theory and analysis of variance 
are identical. If the equations are not satisfied, then one might go on to a Spear- 
man analysis or more generally to a multiple-factor analysis. If the equations 
are satisfied, one can continue in the standard sense of analysis of variance, 
and seek additional sources of variation. The size of a residual error depends 
in large part upon the sources of variation included in the analysis. The book 
itself later on indicates the reader of a test as a source of variation; many 
other sources can be studied along the usual lines of analysis of variance— 
provided the equations can be extended to hold for the additional sources— 
thereby reducing the experimental error. However, experimental error can 
never be reduced beyond that obtained by a strict replication or retest under 
precisely identical conditions. 

That the book’s reliability theory is aimed at the problem of universal 
predictability is stated explicitly for the first time in Chapter 14: “In addition, 
parallel tests should have equal validities for predicting any criterion” (p. 181). 
Votaw’s sampling theory of compound symmetry is discussed in this connec- 
tion. Now, we have already seen that a test can belong simultaneously to 
more than one parallel set. In such a case, it clearly can have different validi- 
ties from any test parallel to it in a given set. Hence, uniform validities for 
any criterion can not be expected. Votaw’s sampling theory is appropriate to 
the administrative problem that originally gave rise to the concept of parallel 
tests: Given two forms and a particular criterion, can we interchange the 
forms to avoid cheating and yet obtain comparable results? All that is needed 
is comparable validity, which can be attained without parallelism. 








LOUIS GUTTMAN 127 


4. On Some of the Algebraic Derivations 

With respect to some of the specific algebraic derivations, the following 
points were noted which seem to require amendments. 

Chapter 4 carefully distinguishes between errors of measurement and 
errors of prediction, but earlier—in Chapter 2—these two seem to have been 
confused. Chapter 2 begins with three variables: X or the observed value, 7’ or 
the true value, and # or the error. It is assumed that 


X,=17,+8£,, (1) 


where the subscript ¢ denotes the 7th respondent. These three variables can 
also conveniently be regarded to be in deviate form, with zero means, and are 
denoted then by lower case letters x, ¢, and e, respectively. The book states 
that ‘‘no assertion regarding probability can be made”’ for estimating true 
scores from observed scores (p. 19). Since it is the population of respondents 
that is involved, however, “probability” has the direct meaning of being the 
proportion of the testees with true scores in the specified interval. The point 
estimate /’ of a true score ¢ from an observed score x can be given in deviate 
form by the following regression: 


S: 2 
¢ = 72, »y = 73 f = 17,2, (2) 


and its standard error of estimate can be calculated as 


So = 81 — ro.) = Bias « (3) 


Instead of (2), the book has erroneously implied that 
’ = 2, (4) 


and has erroneously used the variance s; instead of (3). 

Equation (4) is typical of an actual theory of reliability concerned with 
what are usually called “errors of measurement.” Chapter 2 is restricted to 
equation (1), with the conditions that 


Me = Trm = Tz,8, = 0. (5) 


Conditions (5) permit only a theory of errors of prediction, and not of errors of 
measurement. Equation (4) cannot be derived from (1) and (5). A wider frame 
of reference is needed, involving a universe of experiments as well as a popula- 
tion of respondents. 

Chapter 3 does develop a real reliability theory for a given test, using an 
infinite universe of parallel tests as the frame of reference. Its ‘‘“communality” 
reliability coefficient depends on the situation, the population, and—as pointed 
out above—the universe of content. 

The Kuder-Richardson approach, referred to in Chapter 16, also lacks 
an adequate frame of reference. This can be seen immediately from the mere 








128 PSYCHOMETRIKA 


fact that it gives no formula for a test composed of but a single item. Chapter 
16 uses the Jackson-Ferguson derivation of the Kuder-Richardson “formula 
20,” which uses an assumption that begins in mid-air, with no means of testing 
its mathematical consistency. Within the frame of reference of retest theory, 
it is easy to prove that the Jackson-Ferguson assumption is false in general. 

While the theory of parallel tests is an actual reliability theory, in the 
sense of equation (4) above, it has a further limitation in not leading directly 
to a solution to the problem of speeded tests. Gulliksen has been the first to 
tackle systematically this latter problem, and his conclusions are admittedly 
tentative. To this reviewer, it appears that the argument developed in Chapter 
17 does not flow directly out of the framework of parallelism, but that some 
additional hypotheses have been inserted whose consistency is not clear. In 
particular, the algebra and concepts seem to get blurred with the introduction 
of “split-half estimates of reliability’ on the bottom of page 234, leaving the 
justification of the final formulas in doubt. 


Prediction, Scoring, and Item Analysis 


5. The Need for Modern Statistical Theory 

The Preface to Theory of Mental Tests indicates that the book is directed 
to readers who already possess familiarity with elementary statistics and with 
tests of significance (p. vii). The list of symbols (p. xi) carefully distinguishes 
between sample statistics and population parameters. This distinction seems 
not to have been made, however, from the outset of the algebra in Chapter 2 
and throughout the book, except for Chapter 14. 

From a pedagogical point of view, it is doubtful procedure to teach 
students that “over a sufficiently large number of cases the average error [is] 
zero... .. In actual practice however, it is customary to assume [a zero average] 
for any particular sample that is being considered” (pp. 6-7). 

The book’s treatment would have been consistent had it been confined to 
parameters based on an indefinitely large population. Then sampling problems 
could have been treated as they should be, along the style of Chapter 14. 

In some chapters following the special problem of Chapter 14, the need is 
pointed out for a real sampling theory, the case being stated especially well 
with respect to current hodge-podge item-analysis techniques. For other 
problems, it is not made so clear to what extent their solution depends upon 
a sampling theory. 

It is especially with multivariate prediction problems that there is great 
danger in disregarding sampling errors for samples of the size so often used in 
practice—say 100 to 300 cases. Gulliksen cites the example wherein a multiple 
regression with a correlation coefficient of .73 in one sample of 150 cases yielded 
a correlation of zero in a second identical kind of sample. Simple scoring in 
this case held up better from sample to sample, yielding a correlation of 
about .25 in both instances. 








LOUIS GUTTMAN 129 


The same danger attends the problem of multivariate selectio> in Chapter 
10. When is a simple scoring procedure better than the one advocated there? 
Unless we know something about the answer to this question, it is doubtful 
whether the multivariate formulas should be used in practice. The same holds, 
perhaps to a lesser extent, for the scoring techniques for maximizing the 
‘internal consistency” and “reliability” of composites. 

The uncertainty as to whether samples or populations are meant arises 
again with respect to the discussion of percentiles versus standard and other 
scores in Chapter 19. 

Sampling problems of the kind mentioned here are largely in the province 
of conventional mathematical statistics. They are numerous and important; 
it is to be hoped that mathematical statisticians will be encouraged to tackle 
them. 


6. Qualitative Data and Structural Analysis 


Current sampling theory by itself cannot solve many problems of predic- 
tion and external validity. Conventional sampling problems concern the 
selection of people from a large population. Mental test theory faces also 
another type of sampling problem—that of selecting items from one or more 
indefinitely large universes of content. This is a basic problem of item analysis. 
To this reviewer it appears that there can be no solution without a structural 
theory. 

Gulliksen develops one of the most rational of prevalent theories of item 
analysis in Chapter 21. That it is rational is shown by the fact that it reveals 
some of its own shortcomings, which Gulliksen carefully points out. That it is 
not entirely rational is shown in part by the central role it gives the Kuder- 
Richardson formula (20), with the attendant confusion as to the meaning of 
“reliability” and whether or not an underestimate is involved instead of an 
estimate. In Chapter 16 it is stated that it has been “demonstrated that the 
value given by [K-R formula (20)] is a lower bound to the reliability coeffi- 
cient”’ (p. 224). No cognizance is taken of this in Chapter 21, nor is it stated 
what theory of reliability is intended. 

Another shortcoming of the approach stems from restricting one’s self 
only to average variances and covariances, and not to a structural theory of 
the intercorrelations. Consider the following example of what is known from 
linear multiple correlation theory. Let two predictors be correlated .60 with 
each other. Then it is better that the criterion correlations be .80 and .00 than 
that both be .40. The first case yields a perfect multiple correlation (= 1.00) 
with the criterion, while the second yields a multiple correlation of only .45. 
Clearly, just knowledge of the average intercorrelations of items with a 
criterion can be of little use in item selection. 

The example just described is borrowed from the theory of linear least 
squares. The same kind of argument holds, with perhaps even greater force, 
for the kind of data with which Chapter 21 deals. Gulliksen has limited him- 








130 PSYCHOMETRIKA 


self to the case of dichotomous items. It is not clear why he has not treated 
the data as the qualitative dichotomies they are, but instead has used linear 
least-squares theory. The role of the marginal distributions of each item 
separately should properly come into prominence here. Indeed, for the special 
structures of perfect scales and some kinds of quasi-scales, the marginals tell 
almost the whole story for item selection for prediction purposes,* for they 
determine the intercorrelations among the items. It seems clear that struc- 
tural theories like those already known for scales (and others now in prepara- 
tion for various kinds of non-scales) are needed before any coherent approach 
ean be had for item analysis. 


REFERENCES 
1. Gulliksen, Harold. Theory of mental tests. New York: John Wiley & Sons, 1950. 
2. Guttman, Louis. Multiple rectilinear prediction and the resolution into components. 
Psychometrika, 1940, 5, 75-99. 
3. Guttman, Louis. A basis for analyzing test-retest reliability. Psychometrika, 1945, 10, 
255-282. 
4. Hoyt, Cyril. Test reliability estimated by the analysis of variance. Psychometrika, 1941, 
6, 153-160. 
*This is to be distinguished from the purpose of defining a wniverse of content. Scale 
analysis, or any other statistical technique, is not to be regarded as an appropriate method 


of item selection for defining content. But after the universe is defined, its structure can 
be studied statistically and then items can be easily selected for efficient use for any given 


purpose. 


Manuscript received 1/26/52 


Revised manuscript received 9/28/52 








PSYCHOMETRIKA—VOL. 18, No. 2 
JUNE, 1953 


COMMENTS ON GUTTMAN’S REVIEW OF 
THEORY OF MENTAL TESTS 


HaRoLpD GULLIKSEN 


EDUCATIONAL TESTING SERVICE 


Dr. Guttman’s review of Theory of Mental Tests is essentially an 
attempt to indicate the main avenues along which he would like to see con- 
tributions made to test theory. 

My aim in writing Theory of Mental Tests was to summarize the major 
areas of the literature in the field, to indicate some of the major areas for 
needed work, and to make some progress toward a unified theory. Guttman’s 
review indicates both that these objectives were fulfilled and that much still 
remains to be done. 

For purposes of this discussion the principal adverse comments on the 
book will be grouped under three major criticisms. 

1. Reliability is treated primarily from the standpoint of parallel tests 
which cannot yield a unique coefficient and only secondarily from the retest 
viewpoint that has been developed by Guttman. 

2. Test theory has not been developed in terms of modern statistical 
theory. 

3. Item analysis and item selection are presented in terms of a bivariate 
theory rather than in terms of a multivariate structural theory such as 
Guttman’s theory of scales and quasi-scales, 


Reliability 


The value of a “parallel form” reliability lies not only in dealing with 
the practical problem of students becoming familiar in advance with the test 
questions, as Guttman implies, but also in the fact that it is the only feasible 
way yet suggested for dealing with reliability of pure speed tests and partly 
speeded tests. 

As to the lack of uniqueness, it seems to me quite appropriate that one 
important measure of reliability should vary with the skill of the test con- 
structor or with his idea of what he is measuring. The ability to construct 
parallel tests is an important one and should not be lost sight of in the statisti- 
cal mazes of analysis of variance. 

For example, suppose we have two tests, one of verbal reasoning and one 
of spatial visualization, both with “retest coefficients” of .95. We find, however, 
that when parallel forms are constructed the two verbal reasoning tests corre- 


131 








132 PSYCHOMETRIKA 


late .93 while the two spatial tests correlate .71; clearly this fact indicates an 
important difference between the two fields or between our grasp of the two 
fields which calls for further investigation. The ‘“‘parallel form’’ reliability is 
a fundamental concept not only from the practical but also from the theoret- 
ical viewpoint. 

The assumption utilized by Jackson and Ferguson in their development 
of the Kuder-Richardson theory is asserted by Guttman to be ‘“‘demonstrably 
false in general.” In reality, however, this assumption is false only in the 
sense in which any approximation is false. The point which should be em- 
phasized is that under many conditions often encountered in testing work 
the assumption discussed above gives a usable and valuable approximation 
to the reliability of the test. 


Statistical Theory 

In his Section 4, “On Some of the Algebraic Derivations,” Guttman 
presents his formula (2) for the estimate of a true score and formula (3) for the 
standard error of this estimate. These formulas are equivalent respectively 
to formulas (21) and (24) which I presented in Chapter 4. However, this 
least-squares approach did not seem to me to be worth developing in greater 
detail then, since a more thorough reconsideration of the foundations of test 
theory was in preparation by Frederic Lord.* 

In dealing with the problems of multivariate selection, I made a start 
at developing invariant parameters. This approach seems to me a better one 
than the “correction for restriction of range,” ‘correction for attenuation,” 
ete. The beginnings of such an approach are overlooked in Guttman’s com- 
ments. I would feel that this approach should eventually supplant the various 
so-called “corrections.” 

Guttman’s suggestion that the book should have stated that the treat- 
ment in general was for ‘parameters based on an indefinitely large population” 
(italics mine) is an excellent one which provides a uniform and accurate pro- 
cedure for presenting theoretical material in a field where sampling theory is 
still to be developed. As Guttman points out, the solution for sampling prob- 
lems may then be introduced wherever it is available. 

I am definitely in agreement with the view that there is a need for a 
development of test theory more closely related to modern statistical theory. 
Many of these statistical sampling problems have now been indicated in vari- 
ous places, including Theory of Mental Tests and Guttman’s present review. 
The need for such work has been amply stated. My hope now is that those who 
are competent in mathematical statistics will aid in advancing test theory 
not only by indicating that psychologists have not yet solved these problems 
but also by presenting solutions to some of the problems that have been 
pointed out. 


*Lord, Frederic M., A theory of test scores. Psychometric Monograph No. 7, 1952. 








HAROLD GULLIKSEN 133 


Item Analysis 

Guttman criticizes my approach to item analysis as being essentially a 
“bivariate and quantitative” theory rather than a structural and qualitative 
theory. It should be pointed out again that the theory I developed follows 
basically from the procedure of finding total score on a test by assigning a 
“0” or a “1” to the answers and adding the “‘1’s” to find the total score. As 
long as this procedure is followed it seems appropriate to use a quantitative 
approach and inappropriate to insist that the item dichotomies are really 
qualitative. The complete multivariate theory seems unfeasible at present, 
so various simplifying assumptions or approximations are used. I chose one 
of these and worked out some of the consequences. 

As to the comment that for perfect scales or quasi-scales the marginals 
tell almost the whole story and determine intercorrelations, this statement 
does not apply to the usual achievement or aptitude test situation. One would 
be badly misled in selecting achievement or aptitude test items on the basis 
of the Guttman scaling theory assumption that ‘marginals tell almost the 
whole story for item selection for prediction purposes, for they determine the 
intercorrelations among the items.’’ This assumption, however, seems to have 
been used successfully for attitude and interest scales by the Education and 
Information Branch of the Adjutant General’s Office. 

It is my feeling that many different types of cases should be worked out 
so that some theory will be available for various situations. For any given 
situation one would use the assumption which seemed the nearest to that 
situation. The item-analysis view presented in Theory of Mental Tests is a 
usable procedure for aptitude and achievement tests. 


Manuscript received 2/13/53 














PSYCHOMETRIKA—VOL. 18, No. 2 
JUNE, 1953 


A FACTOR-ANALYTIC STUDY OF REASONING ABILITIES 


RussE. F. GREEN 
UNIVERSITY OF ROCHESTER 
J. P. Guitrorp, Paut R. CHrisTENSEN 
UNIVERSITY OF SOUTHERN CALIFORNIA 
AND 
ANDREW L. CoMREY 


UNIVERSITY OF CALIFORNIA AT LOS ANGELES 


A battery of 32 tests was administered to a sample including 144 Air 
Force Officer Candidates and 139 Air Cadets. The factor analysis, using 
Thurstone’s complete centroid method and Zimmerman’s graphic method 
of orthogonal rotations, revealed 12 interpretable factors. The non-reasoning 
factors were interpreted as verbal comprehension, numerical facility, per- 
ceptual speed, visualization, and spatial orientation. The factors derived from 
reasoning tests were identified as general reasoning, logical reasoning, eduction 
of perceptual relations,-eduction of conceptual relations, eduction of conceptual 
patterns, eduction of correlates, and symbol substitution. The logical-reasoning 
factor corresponds to what has been called deduction, but eduction of 
correlates is perhaps closer to an ability actually to make deductions. The 
area called induction appears to resolve into three eduction-of-relations 
factors. Reasoning factors do not appear always to transcend the type of 
test material used. 


This is the first in a series of studies designed to explore abilities con- 
sidered to be important in the success of high-level personnel.* In this 
study an attempt was made to isolate and to define more precisely primary 
abilities in the domain of reasoning.f The existence of several distinct 
reasoning factors is generally accepted, but their number and definitions 
are by no means clear. 


Review of Recent Findings and Current Hypotheses 


L. L. Thurstone conducted one of the first studies that brought out 
factors defined as reasoning (9). He defined an induction and a deduction 


*Under Contract N6onr-23810 with the Office of Naval Research. The views expressed 
here are not nesessarily shared by the Office of Naval Research. These studies are under 
the general direction of J. P. Guilford. P. R. Christensen is assistant director. A. L. Comrey 
was in direct charge of this study during its early stages and R. F. Green during most 
of its progress. 

fLack of space prevents our reporting all phases of this study in detail here. For 
more detailed information see (2, 4, 5). 


135 








136 PSYCHOMETRIKA 


factor. He thought that induction items require the examinee to find rules 
or principles, whereas deductive items require him to apply rules or prin- 
ciples. He also tentatively proposed a restrictive-reasoning factor, with an 
arithmetic-reasoning test as its chief referent. He thought that such problems 
restrict the channels of reasoning by the conditions prescribed. 

The Army Air Forces Aviation Psychology Research Report No. 5 
describes several analyses leading to the conclusion that there are three 
reasoning factors (3). These were rather non-committally designated as 
reasoning I, II, and III. Reasoning I was best characterized by arithmetic- 
reasoning tests. It appeared with variances in nearly all reasoning tests, 
and hence was called “general oe ee quite often crept into non- 
reasoning tests, especially when they became difficult for the examinees. 
Reasoning II was most characteristic of a figure-matrix test, of the type of 
Raven’s Progressive Matrices. It bears some resemblance to Thurstone’s 
induction factor. Reasoning III was most characteristic of a figure-classifica- 
tion test in which one of five figures must be selected because it has certain 
properties in common with three other figures. No evidence of a deduction 
factor appeared in the AAF results, probably because there were no strongly 
definitive tests for it in the analyses. 

Blakey performed an analysis especially aimed at reasoning abilities (1). 
He found only two factors identified as reasoning: induction and deduction. 
His battery included only eleven tests, however, probably too few to define 
the whole domain of reasoning. 

Zimmerman re-rotated the reference axes of Thurstone’s initial study 
and analyzed the matrices involved in two other AAF studies (12). He 
identified three reasoning factors, one of which appeared to be a classifying 
ability represented by Thurstone’s Sound Grouping test and his Figure 
Classification test. This suggests the identification of the classification 
factor with the AAF reasoning IIT. Thurstone’s restrictive factor was identified 
with the AAF reasoning I. Thurstone’s deduction factor was confirmed, 
but it was concluded that some change will probably be needed in its definition. 


Reasoning Hypotheses Postulated jor This Study 

On the basis of previous findings and their implications, four reasoning 
factors were assumed to exist. They correspond to the three AAF factors 
plus Thurstone’s factor of deduction. Several different sub-hypotheses were 
set up as to the more precise nature of each of the four factors. These hypothe- 
ses and sub-hypotheses have served as the logical base for what was recognized 
as a continuing research program in which this study is an exploratory 
investigation. We hoped, in this study, to answer only a few questions: 
(1) whether four factors are sufficient in number to account for the recognized 
domain of reasoning tests; (2) if so, whether the four correspond to the four 
expected; and (3) whether, if they did, some of the special hypotheses concern- 








RUSSEL F. GREEN, ET AL. 137 


ing their properties are better supported than others. In spite of the fact that 
we were not able to test all sub-hypotheses in this study, we present a complete 
account of them as the framework for a program of research. From an opera- 
tional point of view, the sub-hypotheses served as starting points for test 
ideas. One aim was to diversify the types of reasoning tests as much as possible. 
The sub-hypotheses served well as a means to this end. 

Because of the involvement of reasoning I in so many tests, this factor 
might be a very general ability to manipulate symbolic material. A some- 
what more restrictive hypothesis assumes that reasoning I is a general 
ability to solve problems. Not all thinking is problem solving. If it can be 
shown that this factor is coextensive with tests that do pose problems, this 
second hypothesis would be supported. A still more restrictive hypothesis is 
that reasoning I is the ability to define, formulate, or structure given problems. 
This is an essential step in all arithmetic-reasoning items. The examinee 
must grasp the set of conditions and realize their inter-relationships and their 
contributions to the finding of a solution. A fourth conception of the factor 
is that it is an ability to test hypotheses. The previous conception—ability 
to define the problem—is a matter of forming hypotheses. Several false 
starts are typically made before the correct nature of the problem is grasped. 
The more quickly such errors can be rejected, the better the performance. 
Having correctly conceived the problem and having rejected wrong hypothe- 
ses, something more remains to be done. There must be a sequence of steps 
organized in order to arrive at the answer. A fifth hypothesis concerning 
reasoning I is accordingly that it is the ability to organize such a sequence 
of steps. Any one of the last three hypotheses could be supported if it turns 
out that tests which feature the kind of operation implied have higher loadings 
in reasoning I than has arithmetic reasoning. 

The favored hypothesis concerning the nature of reasoning II is that 
it is what Thurstone called induction; the ability to see rules or principles in 
a set of objects. The concepts of “rules” and “principles” are rather general 
and perhaps more vague than is desired. There are many kinds of rules and 
principles. It may be that reasoning II can be more precisely defined as an 
ability to see certain kinds of rules and principles. With this thought in 
mind, and with the fact that a figure-matrix test is a typical measure of 
reasoning II, several more specific hypotheses have been formulated. They 
take a more analytical view of the examinee’s task. 

A figure matrix may be conceived by the examinee as presenting a 
system. Each item presents a different system which he must grasp. This 
idea places the emphasis upon the totality of the thing perceived. A second 
specific hypothesis is that reasoning II is an ability to see trends in a series 
of objects. In the matrix test, changes along rows and along columns are 
progressive. A trend may also be regarded as a set of relationships, the same 
relationship being repeated between successive neighbors of a series. A 














138 PSYCHOMETRIKA 


hypothesis that reasoning II is a more general ability to see relationships is 
therefore suggested. This hypothesis is atso Suggested by the fact that reason- 
ing II has a component variance in figure-analogies tests. In such a test 
only pairs of objects are involved. It is possible to think of a trend being 
established by a series of two objects, but this is ordinarily conceived as the 
seeing of a relationship. In both figure-analogies tests and in figure-matrix 
tests, however, something more than seeing a relationship is required. The 
examinee must realize that the same identical relationship exists between 
other pairs or runs of objects. Another hypothesis, therefore, emphasizes 
the identifying of the same relationship in different settings. 

One additional hypothesis concerning reasoning II was considered in 
view of one fact. A Gottschaldt-figure test was found to have some variance 
in reasoning II in the AAF results. This test involves the perceptual analysis 
of a closed figure out of a larger closed figure in which it is embedded. If we 
consider what this test has in common with figure-matrix tests, the hypothesis 
suggested is that of a form-analysis ability of some kind, assuming that we 
can regard a figure matrix as a complex form that must be analyzed. Thur- 
stone has found an important perceptual factor in a Gottschaldt-figure test 
which he defined as closure against distracting material (10). It would be 
interesting to find that there is a closure factor common to both perceptual 
and symbolic material. Gestalt psychology would predict such abilities 
cutting across perceptual and thinking domains. 

Reasoning III is regarded as some kind of a classifying ability. The 
formation of a class idea depends upon seeing elements or properties that are 
common to a collection of objects. This act seems central and essential. 
Reasoning III may therefore be an ability to see common elements or proper- 
ties. Tests of reasoning III, however, have usually involved more than seeing 
common properties. They have, for example, required the exclusion from a 
list of objects of one not having the common properties—a classifying (or de- 
classifying) act. The question also arises as to whether the ability is of a 
very general nature, extending to all kinds of objects, or whether it is confined 
to figures or to a limited number of kinds of objects. These contingencies 
have been provided for in two hypotheses. 

In the AAF results, reasoning III was a consistent contributor to vari- 
ance in figure-analogies tests. It is not very easy to see a classifying act in 
an item of that test, unless it comes in the final step. After the examinee has 
in mind the third figure and the relation to be fulfilled, the fourth figure is 
of a certain kind required to fulfill that relation. Which of the alternative 
answers has those properties, or falls in that class? This final act seems 
better described by the Spearman concept of “eduction of a correlate,” 
however. This suggests the hypothesis that reasoning IIT is the ability to 
educe a correlate. This kind of act is not so clear in classification tests, but 
is tolerated here as a possibility. 














RUSSEL F. GREEN, ET AL. 139 


The leading hypothesis for reasoning IV is that it is a general ability 
to draw correct inferences from premises. In other words, it is deduction. 
There is still the unsettled question, however, as to the generality of this 
factor. Previous results show that the factor’s leading variances are in formal, 
syllogistic tests. It still remains to be established whether more informal 
deductive tests will show as much variance. The rival hypothesis is that 
reasoning IV is merely a syllogistic-reasoning ability. 

The hypotheses may be summarized in the following outline for the sake 
of ready reference. In describing tests we will make references to the hypothe- 
sized abilities that are probably emphasized by those tests. 


Reasoning I Reasoning II 
a. Manipulating symbols a. Seeing rules or principles (induction) 
b. Solving problems b. Seeing systems 
ce. Defining problems c. Seeing trends 
d. Testing hypotheses d. Seeing relations (educing relations) 
e. Organizing a sequence of related e. Seeing identity of relationships 
steps f. Analyzing forms 
Reasoning IIT Reasoning IV 
a. Seeing common elements or a. Drawing inferences (deduction) 
properties b. Syllogistic reasoning 


b. Classifying (in general) 
ce. Classifying forms 
d. Educing correlates 


The Tests 


Several tests were constructed especially for this study.* In addition 
to the hypotheses, various considerations helped to determine the kinds of 
tests constructed. Possibly the most important consideration was that all 
the reasoning tests were made quite short, containing from 10 to 25 items. 
This conserves administration time and at the same time probably provides 
sufficient true variance for the purposes of analysis. All tests are group tests, 
most are essentially power tests, and most involve multiple-choice items. 
The common belief seems to be that reasoning factors transcend the kind of 
test material, but to test that idea we used different kinds of material— 
forms, words, letters, and numbers—in different tests. A special effort was 
made to keep the apparent factorial complexity of most tests to a minimum. 

In this first study, the number and kinds of tests were insufficient to 
give us clear examination of all the sub-hypotheses. For some sub-hypotheses 
no unique tests were included in the battery, either because there was less 





*Many of these tests were developed under a contract with the U.S. Navy Elec- 
tronics Laboratory, San Diego, under the monitorship of Arnold M. Small, during 1949. 








140 PSYCHOMETRIKA 


faith in those hypotheses or because suitable test ideas were not available, 
with the expectation that those sub-hypotheses would be investigated in 
subsequent studies. Among such sub-hypotheses were I, b, ¢c, and e; II, 
b, ec, and e; and III, a. This is not to say that the kinds of abilities described 
by these sub-hypotheses were not involved in any tests, for many of them 
were represented in combination with other hypothesized abilities, as Table 1 
will show. There was also the difficulty that after a test had been developed 
to measure one hypothesized ability we were forced to recognize that it 


‘probably also measured some other in our list. When a test is reported in 


Table 1 to be an expected measure of two or more abilities it does not mean 
that the abilities are necessarily of equal importance. It was hoped that two 
tests measuring the same pair of abilities, for example, would be slanted 
differently so as to effect separation in meaningful ways in the common- 
factor space. 

Two of the Blakey reasoning tests were included in the battery because 
they had proved to be good measures of factors in his analysis. Reference 
tests were included to define clearly the non-reasoning factors of verbal 
comprehension, numerical facility, perceptual speed, spatial orientation, and 
visualization, which were expected to occur in small degrees in various 
reasoning tests even though a special attempt to minimize these factors was 
made. Each test is described briefly in Table 1.* The hypothetical factor or 
factors that each test was designed or selected to measure are indicated in 
the last column of Table 1 according to the following code: I (reasoning I— 
II, III, and IV have similar meanings); Ia (reasoning I, hypothesis a— 
similarly for other hypotheses); N (numerical facility); P (perceptual speed); 
S (spatial orientation); V (verbal comprehension); and Vz (visualization). 


Testing, Scoring, and Factoring 


The test battery was administered to 144 Officer Candidates at Lackland 
Air Force Base and to 139 Air Cadets at Randolph Air Force Base, San 
Antonio, Texas.t A study of the age and educational levels of the two groups 
showed differences of less than two years. Other statistical comparisons of 
data from the two groups showed that we were justified in combining them 
for a single factor analysis. More details concerning the two samples will be 
found elsewhere (2, 5). 

The correlation coefficients computed were mostly Pearson product- 
moment 7’s. A few variables had been dichotomized, and for these either 
biserial r’s or tetrachoric r’s were computed as estimates of Pearson r’s. 
The coefficients are given in Table 2. They are generally small but univer- 

*Complete descriptions and sources of tests are given elsewhere (2, 4). We are in- 
debted to R. I. Blakey for the use of tests 32 and 33 and to L. L. Thurstone for permission 
to use tests 1, 25, 26, 27, and 29. Test 31 was designed by R. C. Wilson. 

tWe are very much indebted to Dr. John T. Dailey, then Director, Directorate of 
Personnel Research, Human Resources Research Center, Lackland Air Force Base, for 


making the testing arrangements and for other assistance in carrying out the testing. 
We are also indebted to Mr. William B. Lecznar, who assisted in many ways. 








RUSSEL F. GREEN, ET AL. 141 


sally positive, ranging from .003 to .602. There are very few zero correlations, 
a fact which might suggest an oblique structure and which also promises 
difficulty in achieving a unique orthogonal rotational solution. 

Estimates of reliabilities of the scores were based upon 100 cases chosen 
at random from the Cadet group. The Kuder-Richardson formula 20 was 
applied to all except the highly speeded tests and except for test 21, for 
which a split-half reliability was estimated. For speed tests 28 and 30, retest 
reliability estimates were obtained from secondary sources. The reliability 
estimates are reported in Table 2. 

Thirteen factors were extracted by Thurstone’s complete centroid 
method. The thirteenth factor was not rotated because its highest loadings 
were only about .16. The twelfth-factor residuals ranged from —.060 to 


+.071, with a distribution that was leptokurtic. The centroid factor matrix | 


is given in Table 3. 

The reference axes were rotated independently by two individuals, 
hereafter designated as X and Y.* Both used the Zimmerman graphic, 
orthogonal system of rotation (11). In the Y solution, the investigator was 
guided mostly by Thurstone’s objective criteria of positive manifold and 


simple structure but paid some attention also to psychological meaningfulness | 


in terms of knowledge of the tests, of the hypotheses to be tested, and of 
previous results with the familiar tests. The other investigator (in the X 
rotations) made a frank attempt to put axes through or near the familiar 
tests that have consistently defined known factors and to see how many of 
the reasoning hypotheses could be verified. Positive manifold and simple 
structure were achieved more or less as by-products. The two rotational 
solutions gave interpretable factors, 11 of which could be matched in the 
two solutions. One in each solution stood alone and only one of these was 
interpretable. The two rotated-factor matrices are given in Tables 4 and 5 


Interpretation of the Factors 


The factors are presented in the general order of their familiarity and 
definiteness. Any test having a loading in either of the solutions of .30_or_ 
higher will be considered in the interpretations. The loadings will be listed 
according to solutions X and Y, respectively. Tests are identified by number 


and name. 


A. Verbal comprehension (V) xX 4 
10 Vocabulary .53 .60 

14 Inference .50 41 

4 Verbal Analogies I 48 47 

13. Verbal Analogies II 42 .28 

1 Sound Grouping 41 AT 

12 Word Classification .36 38 

24 Correlate Completion .22 .30 


*One of the authors, R. F. Green, and Wayne 8. Zimmerman. 








142 


PSYCHOMETRIKA 


TABLE 1 


Summary of Test Requirements and Corresponding Hypotheses 





Test 





Task Required for Item 


Hypothesized 
Factor Content 





. Sound Grouping 
2. Figure Classification 


_ 


3. Letter Triangle 


. Verbal Analogies I 
5. Figure Matching . 


ee 


6. 


Essential Operations 


temote Verbal 


Similarities 
8. Prescribed Relations 
9. Problem Solving 
. Vocabulary* 


11. Figure Exclusion . 
12. Word Classification 


. Verbal Analogies II 
14. Inference 


15. Hidden Figures 
16. Number and Opera- 
tions I 


17. Number and Opera- 
tions II . 


18. Number and Opera- 
tions IIT 


Which one of five words sounds different? 

Define classes of figures and assign other fig- 
ures to the correct classes 

Find system in a triangular pattern of letters 


Multiple-choice analogy, first pair difficult . 

Select figure having most in common with 
given figure 

Which information is ant essential to the 
solution of the arithmetical reasoning 
problem? 


Select word that has most in common with 
given word Peet os 

Select figure that embodies the stated change s 
of the given figure 

Solve arithmetic-reasoning problems 


Indicate meaning of word presented in brief 
context bias : 

Which one of five figures does not belong? 

Which one of five class names does not belong? 


Multiple-choice analogy, second pair difficult 

Select correct one of five conclusions from 
given statement oS 

Indicate which of five figures i is contained i in 
given figure 


Which equation is true after a certain inter- 
change of signs or numbers is introduced? 


What interchanges will make the equations 
true? 


First discover which interchange corrects 
main equation, then select an equation that 
is made true by that interchange 


IIIab 


IITac 
IIbe 


IId 


Id, IIIac 


Ice 


IIIab 
IIId 
Tabede 


V 
IIlac 
IIIab 


I{Id 
IVa 


Iif 


Ibd 


Iacd 


Iabed 





*Test included as a reference test because of its known factor content. 








RUSSEL F. GREEN, ET AL. 


TABLE 1—Continued 


143 








Test 


Task Required for Item 


Hypothesized 
Factor Content 





19. 


20. 


21. 


22. 
23. 


“24. 
25. 
26. 
27. 
28. 
29. 
30. 
31. 
32. 


33. 


34. 


Figure Matrix . . 
Word Matrix 
Ship Destination . 
Syllogisms 


Figure Analogies . 
Completion 


Correlate Completion . 


Secret Writing . 
Identical Forms* 
False Premises. . . 


Numerical Operations* 


Punched Holes* 
Perceptual Speed* 
Symbol Manipulation 
Form Reasoning . . 


Circle Reasoning . 


Space Orientation 


. Discover system of changes in a 3x3 matrix 


of figures 


. Discover system of anne in a 1 253 matrix 


of words . 


. Find best port for ship, ‘considering the influ- 


ences of several variables 


. . Select correct syllogistic conclusions . 
. Draw figure which correctly = figure 


analogy . 
Complete last correlate of pair series 


. Decode numbers representing letters 


Which form is the same as the one given? 


. Is syllogism (nonsense type) true or false? . 


Indicate correct answer in — arithmetical 
computation . . 


. Indicate pattern of holes in 1 unfolded paper 


after it is punched while folded . 


. Which form is the same as the one given? 


Mark symbolically presented “If—Then” 
statements true or false . 


. Solve simple equations given in terms. of 


familiar forms 


. Discover rule for marked ted i in patterns . 


. Determine position in space from which pic- 


ture was taken . 


IIbe 
IIbede 


la 
IVb 


IId, IIId 
IIade, IIId 


Id, Ila 
P 

IVb 

N 


Vz 


IVab 


Ta 
IIbe 





*Test included as a reference test because of its known factor content. 





144 PSYCHOMETRIKA 


TABLE 2 
The Correlation Matrix* 











- x ex g & 
be = ¢ be 2 % 2 oe = 2 z . = = 
2523 P2828 = +a2ae% ire 
cerns = Se a a ae a a 
cease eteameeteéete#eisazeé 
Pe ai ac} aa is a) tN oa o S a ol 26 bs) ao < b 

1 231 252 401 177 264 185 300 258 452 126 239 339 349 246 203 260 
2 231 224 272 147 138 210 271 304 209 192 195 248 174 144 315 286 
3. 252 224 185 084 158 119 314 442 091 194 144 193 169 252 267 331 
4, 401 272 185 188 313 284 325 363 441 113 275 425 463 255 285 355 
5 177 147 084 188 096 044 247 070 161 141 106 213 121 207 144 123 
6 264 188 158 313 096 167 261 354 252 154 110 223 307 123 205 257 
a% 185 210 119 284 044 167 199 162 275 064 150 190 252 194 216 157 
8. 300 271 314 325 247 261 199 422 363 143 119 281 313 345 387 393 
9. 258 304 442 363 070 354 162 422 228 263 197 181 412 356 304 469 
10. 452 209 091 441 161 252 275 363 228 110 243 334 456 261 252 242 
a3. 126 192 194 113 141 154 064 143 263 110 042 280 098 209 234 133 
12. 239 195 144 275 106 110 150 119 197 243 042 176 269 048 178 126 
13. 339 248 193 425 213 223 190 281 181 334 280 176 414 208 287 253 
14. 349 174 169 463 121 307 252 313 412 456 098 269 414 250 270 325 
15. 246 144 252 255 207 123 194 345 356 261 209 048 208 250 265 305 
16. 203 315 267 285 144 205 216 387 304 252 234 178 287 270 265 439 
17. 260 286 331 355 123 257 157 393 469 242 133 126 253 325 305 439 
18. 274 279 322 326 122 278 219 376 393 265 152 134 247 305 288 433 496 
19. 138 247 224 184 250 208 134 356 339 093 237 140 124 103 179 306 267 
20. 316 209 225 443 132 278 221 362 356 300 136 202 381 397 374 358 359 
21. 133 095 250 151 083 210 040 255 372 131 155 105 066 131 077 118 236 
ae. 340 212 228 391 162 313 149 362 382 405 189 214 323 422 277 251 305 
23. 332 378 398 357 253 229 188 438 447 240 308 137 294 340 421 382 389 
24. 328 231 303 400 226 252 230 379 357 423 189 213 298 387 336 324 334 
25. 255 318 439 349 177 281 176 449 420 213 202 170 232 286 272 406 397 
26. 174 216 220 236 121 097 091 227 254 151 177 060 053 150 184 195 257 
2. 301 110 183 388 226 416 159 343 327 286 140 136 359 386 209 282 246 
28. 207 106 178 099 023 251 089 202 175 142 101 063 137 172 114 262 204 
29. 252 421 343 212 292 154 151 299 480 078 311 231 273 253 383 355 312 
30. 122 230 272 156 220 092 139 278 282 073 316 069 099 152 228 321 319 
31. 369 295 136 384 141 234 136 313 272 308 082 099 302 383 214 348 263 
32. 003 136 250 043 187 107 057 290 266 160 148 114 104 156 207 275 313 
33. 132 200 358 212 096 124 155 217 410 143 165 239 248 192 202 333 370 
34. 198 129 255 248 280 211 117 453 382 166 196 090 120 129 279 274 271 





*Decimal] points have been omitted. 








RUSSEL F. GREEN, ET AL. 145 


TABLE 2, Continued 








Secret Writing 
Symbol Manipulation 
Form Reasoning 


. Syllogism Test 
23. Figure Analogies Completion 


18. Number & Operations III 
24. Correlate Completion 

28. Numerical Operations 
34. Spatial Orientation 


21. Ship Destination 


19. Figure Matrix 
20. Word Matrix 

26. Identical Forms 
27. False Premises 
29. Punched Holes 
30. Perceptual Speed 
33. Circle Reasoning 


ree Reliabilities 


32. 


a> 


Nn 


31. 


99 


274 138 316 133 340 332 328 255 174 301 207 252 122 369 003 132 198 


aI 


279 247 209 095 212 378 231 318 216 110 106 421 230 295 136 200 129 5 
322 224 225 250 228 398 303 439 220 183 178 343 272 136 250 358 255 57 
326 184 443 151 391 357 400 349 236 388 099 212 156 384 043 212 248 48 
122 250 132 083 162 253 226 177 121 226 023 292 220 141 187 096 280 43 


278 208 278 210 313 229 252 281 097 416 251 154 092 234 107 124 211 53 
219 134 221 040 149 188 230 176 091 159 089 151 139 136 057 155 117 49 
376 356 362 255 362 438 379 449 227 343 202 299 278 313 290 217 453 54 
393 339 356 372 382 447 357 420 254 327 175 480 282 272 266 410 382 80 
265 093 300 131 405 240 423 213 151 286 142 078 073 308 160 143 166 65 


152 237 136 155 189 308 189 202 177 140 101 311 316 082 148 165 196 Ad 
134 140 202 105 214 137 213 170 060 136 063 231 069 099 114 239 090 22 
247 124 381 066 323 294 298 232 053 359 137 273 099 302 104 248 120 36 
305 103 397 131 422 340 387 286 150 386 172 253 152 383 156 192 129 64 
288 179 374 077 277 421 336 272 184 209 114 383 228 214 207 202 279 83 


433 306 358 118 251 382 324 406 195 282 262 355 321 348 275 333 274 56 
496 267 359 236 305 389 334 397 257 246 204 312 319 263 313 370 271 80 

237 312 226 299 381 353 435 145 241 327 267 266 227 283 271 167 67 
237 232 357 229 325 221 408 226 249 132 267 343 118 299 212 247 77 
312 232 108 375 495 486 333 116 364 119 338 094 304 068 221 323 59 


226 357 108 164 199 161 297 211 210 236 081 179 142 258 263 243 77 
299 229 375 164 385 343 328 067 413 105 202 197 378 092 170 373 64 
381 325 495 199 385 441 416 219 318 210 491 303 285 239 216 407 66 
353 221 436 161 343 441 339 077 332 211 255 165 250 196 320 237 75 
435 408 333 297 328 416 339 284 298 275 358 376 303 407 391 365 98 


145 226 116 211 067 219 077 284 136 193 219 550 148 383 250 175 a 
241 249 364 210 413 318 332 298 136 215 136 064 301 103 208 265 50 
327 1382 119 236 105 210 211 275 193 215 029 224 263 288 348 108 92 
267 267 338 081 202 491 255 358 219 136 029 387 206 242 318 460 90 
266 343 094 179 197 303 165 376 550 064 224 387 075 602 293 178 92 


227 118 304 142 378 285 250 303 148 301 263 206 075 149 127 110 76 
283 299 168 258 092 239 196 407 383 103 288 242 602 149 346 225 94 
271 212 221 263 170 216 320 391 250 208 348 318 293 127 346 364 91 
167 247 323 243 373 407 237 365 175 265 108 460 178 110 225 364 96 








*Decimal points have been omitted. 





146 PSYCHOMETRIKA 


TABLE 3 


Centroid Factor Loadings and Communalities* 








A B C D E F G H i J K L h? 





1 506 315 —040 —073 051 —088 083 —248 —090 —028 072 030 456 
2 458 —053 151 —196 034 —089 214 045 —148 —101 142 093 392 
3 496 —223 133 147 —085 051 066 —150 —069 027 081 144 405 


4 587 387 058 —064 —084 088 —085 104 —123 —109 069 —108 578 
5 326 —023 —069 —143 295 076 —061 035 160 —182 —136 032 309 
6 4438 196 —151 226 030 067 108 054 —096 086 —062 —092 358 


329 153 097 —093 —111 —064 018 033 —027 —112 —075 —101 195 
631 —049 —090 115 124 —157 —145 058 084 —160 068 034 525 
657 —119 110 210 —044 173 —051 —102 —174 1380 176 1383 648 


con 


10 494 416 —122 —148 —104 —136 —136 —123 021 —163 —026 092 553 
il 354 -—150 074 —112 171 144 095 —070 -112 140 —238 065 322 
12 312 161 066 —090 —153 178 080 —044 048 —180 082 080 247 
13 490 294 102 —166 102 076 160 131 107 116 —098 058 461 
14 555 400 —030 —059 —155 080 —122 050 052 179 056 172 588 
15 490 —032 144 —032 108 —142 —258 -—112 047 123 —087 037 400 


16 582 —081 109 —029 —032 —187 145 203 142 064 028 —094 490 
17 605 —-117 102 095 —171 —114 —084 161 —075 092 117 —016 502 
18 578 —064 053 111 —205 —248 071 145 —075 052 —051 081 501 


19 472 —246 —143 078 119 072 079 1384 —095 —219 —129 037 429 
20 575 239 248 115 102 —046 —127 055 057 064 —034 —086 510 
21 371 —173 —228 280 —074 143 079 —130 —114 —109 —052 074 380 


22 556 268 —056 1380 159 078 —119 045 —051 043 089 097 471 
23 667 —080 188 042 241 —125 —104 —043 —077 062 —052 071 530 
24 587 170 118 081 —066 —092 —107 —079 105 —064 —186 086 482 


25 652 —193 —062 119 —048 016 090 124 032 —076 100 060 527 
26 400 —285 —257 —272 —125 050 —110 —069 —157 047 081 —244 510 
27 515 277 —168 222 149 134 033 104 061 108 —063 —098 500 


28 362 —065 —214 130 —191 —159 266 —124 142 216 —116 —100 436 
29 561 —250 372 —265 275 182 035 —058 065 054 151 090 737 
30 486 —467 —228 —351 —121 114 —148 112 —153 153 —088 —050 748 


31 473 227 —147 —080 069 —193 103 063 050 109 269 088 454 
382 446 —431 —306 —128 —224 031 —126 057 247 065 —129 112 659 
33 501 —209 142 086 —322 218 113 —169 308 023 043 —158 637 


34 500 —174 032 186 274 140 —191 —186 171 —117 167 —182 569 





*Decimal points have been omitted. 











RUSSEL F. GREEN, ET AL. 
TABLE 4 
Final Rotated Matrix-X* 


147 








B 
f N° OP V: 


E F G H I J K 
S GR LR PR CR CP EC 


h2 





Noe 


21 
22 
24 
25 
26 
27 


28 


41 22 —04 15 
21 07 10 32 
—05 13 14 22 


49 —05 02 05 
17 03 12 30 
16 25 04 03 


25 05 10 09 
07 17 10 14 
Ol 05 18 19 


53 12 01 —02 
36 —02 —04 07 


42 15 04 36 
50 11 10 05 
02 06 17 27 


09 29 12 25 
00 11 21 08 
03 27 19 11 


00 13 28 21 
17 05 —07 27 
—04 22 18 —03 


25 05 01 13 
00 10 17 44 
22 16 04 21 


06 23 20 15 
12 09 53 —08 
23 25 01 09 


04 59 1s —02 
12 —09 17 64 
06 04 79 09 


29 28 —01 07 
01 26 57 —02 
09 25 09 10 


34 —07 04 00 20 


15 30 21 04 05 04 24 
19 32 -—02 —06 17 00 03 
10 36 13 09 11 35 07 


16 16 28 11 38 07 19 
16 —09 OL 35 —02 —05 12 
04 14 33 21 31 03 00 


06 04 01 01 26 06 20 
28 16 20 35 16 03 30 
1g 43 39 14 18 37 05 


04 15 keg 14 06 01 41 
00 10 11 07 07 10 00 
04 15 —0l 11 10 23 01 


00 —05 28 04 19 03 02 
—03 04 47 —06 —Ol 19 15 
14 04 28 05 04 14 40 


21 00 13 01 34 09 11 
18 20 27 01 35 19 17 
—01 23 17 02 35 10 22 


12 20 —03 41 23 00 01 
13 01 39 10 32 13 30 
02 33 05 36 12 22 —02 


13 19 47 27 14 03 13 
20 22 31 14 17 08 32 
—04 09 21 20 21 22 41 


22 24 14 27 26 20 02 
40 it) 2 —01 09 12 06 


01 04 09 00 13 19 05 
36 12 13 01 01 27 00 
25 05 03 05 09 13 01 


23 18 32 00 0s —106 06 
09 -—08 -—Ol 25 —06 28 08 
19 —01 00 08 22 67 01 


50 05 19 38 08 27 15 


—03 
16 
09 

—07 
07 
28 
13 
12 

—10 
13 
13 
17 
10 
35 
34 
12 
10 

—04 
09 


13 
14 


00 
—04 
04 
20 
16 
28 
14 


01 


457 
395 
411 


580 
313 
361 


203 
528 
646 


555 


251 


431 
512 
388 


471 
594 
490 


535 
512 
508. 


438 
736 
749 


445 
666 
646 


579 





*Decimal points have been omitted. 





148 PSYCHOMETRIKA 


TABLE 5 
Final Rotated Matrix-Y* 





A B C D E F G H J K L M 
VN P V; Si GR GR PR CP j({BC ‘Ss ? h? 








47 25 04 22 00 13 12 01 01 16 23 00 446 


1 22 
2 21 04 15 37. —05 04 —02 20 24 =I 27 03 389 
3. «(04 15 08 31 13 33 ~—06 01 28 08 12 20 8400 
t ay =0S 08 04 24 32 18 21 22 22 =O; 570 
5 17 04 12 14 29 —03 04 35 —10 11 —02 04 302 


6 12 27 01 00 05 34 28 15 03 05 20 —-04 351 


08 07 04 —04 03 12 15 23 14 04 —04 196 


7 26 
8 15 13 14 10 18 19 03 32 07 36 29 24 «8512 
9 09 07 1] 35 17 53 12 —02 24 14 23 18 630 


10 60 13 O4 00 -—0i 03 20 10 04 24 14 19 546 
11 02 16 19 39 12 14 kg 15 03 -05 —12 02 317 
12 38 -—03 —02 06 13 12 03 06 23 —05 08 02 245 
13 28 13 00 24 17 —04 42 20 146 —01 16 —04 452 
14 41 03 00 07 13 14 48 —04 16 1] 23 21 578 
15 09 1] 14 29 13 06 19 04 08 4] 02 19 395 


16 «Ol 22 19 15 12 —02 17 25 38 17 32 06 485 
iy Ol 06 21 13 01 25 Ay 37 25 29 20 491 
iS 04 24 11 ig —il 17 17 19 37 16 24 28 =«491 


19 03 13 21 13 11 29 —04 46 06 00 07 14 415 
20 17 10 —06 20 14 15 33 18 21 40 17 -—04 499 
21 06 24 08 01 10 47 —10 14 04 —04 04 19 373 


22 «26 04 —03 16 16 30 31 16 —04 21 29 10 466 
23 4«—(07 16 14 46 12 19 16 24 10 37 16 14 588 
24 30 23 —04 14 12 13 22 18 23 32 02 20 473 


5 20 13 20 30 02 28 28 07 31 21 = =516 
26 12 06 66 01 —05 1s —vl pi 08 06 07 05 504 
27.15 24 -—02 -—02 23 30 38 21 —03 12 24 -05 489 


28 00 56 14 —06 08 08 10 —03 19 —02 15 12 482 
29 09 -—09 24 64 39 04 02 1] 24 11 12 —02 731 
30 —02 00 75 14 10 16 14 12 14 —04 -—03 26 740 


31 26 14 09 12 06 —02 19 06 02 08 53 12 452 
32 —02 16 46 —07 32 06 03 14 20 -—02 —02 49 648 
33 «10 24 16 02 42 22 —-04 —07 55 07 01 02 628 


34 «(03 06 15 15 47 30 —10 12 01 38 16 —-05 553 





*Decimal points have been omitted. 











RUSSEL F. GREEN, ET AL. 149 


This is clearly the verbal-comprehension factor. The list of tests is 
headed, as usual, by the Vocabulary test, and it contains most of the verbal 
tests in the battery (seven of the ten). In the construction of verbal tests not 
appearing in the list, efforts had been made to minimize the verbal-factor 
variance, apparently with some success. 

The verbal variance is higher than expected in two of the tests, Infer- 
ences and Sound Grouping. In the former, not enough attention was paid to 
keeping vocabulary simple, for such words as ‘“‘martyr,” “allegiance,” and 
“privileges” appear. The loading in the Sound Grouping test is supported 
by Zimmerman’s finding of a loading of .30 (12). It may be that the inter- 
pretation of the verbal factor should play down the term ‘‘comprehension”’ 
or should even substitute the label “verbal knowledge,” for the Sound 
Grouping test requires only that words be known well enough to pronounce 
them. It is possible, however, that learning meanings and pronunciations go 
together and therefore remain correlated. 


B. Numerical facility (N)* xX ¥ 
28 Numerical Operations .09 06 
16 Number and Operations Changes I .29 22 
31 Symbol Manipulation .28 14 


This is the well-defined numerical-facility factor. On the whole, the 
number variances appeared in tests where expected but in such small quan- 
tities that they were of no help in rotations in this respect. Their low loadings, 
however, help to make reasonable the forcing of an axis through the Numerical 


Operations test. 


C. Perceptual speed (P) xX Y 
30 Perceptual Speed 19 yf) 
32 Form Reasoning .O7 46 
26 Identical Forms .03 . 66 
11 Figure Exclusion ol .19 


This is the visual, perceptual-speed factor. The two reference tests 
put into the battery to account for this factor are high in the list. 

The high loadings for Form Reasoning came as no surprise. In spite 
of the fact that Blakey had found this test to be factorially pure on what he 
thought was a reasoning factor (1), it looks like a perceptual-speed test, 
at least for superior examinees. The items are so easy that the task is a 
matter of rapid identification of simple geometric forms. 


*Here we have liberalized the rule of a minimum loading of .30 by extending the 
lower limit of listed tests to .28 in order to avoid basing interpretation on a single test. 








150 PSYCHOMETRIKA 


D. Visualization (Vz) x 
29 Punched Holes . 64 . 64 

23 Figure Analogies Completion 44 .46 

11 Figure Exclusion 41 39 

13 Verbal Analogies II .36 24 

2 Figure Classification 32 37 

5 Figure Matching .30 .14 

3 Letter Triangle .22 31 

9 Problem Solving .19 85 


This is the visualization factor that was distinguished from a space 
factor in the Army Air Force research (3). It is most heavily weighted in 
Thurstone’s Punched Holes test, which was put into the battery to identify it. 

The visualization factor is defined as an ability to manipulate or trans- 
form a pictorially or verbally given object into another visual arrangement. 
Its appearance in a verbal test is not new. It has shown small variances in 
such tests as Reading Comprehension and Arithmetic Reasoning. It is 
apparently of general utility in problems that require reasoning. This does 
not make it a reasoning factor; it is probably merely an aid in problem solving. 
It is more likely to come into play in connection with pictorially presented 
problems, as attested by the preponderance of figural tests in the list. 


E. Spatial orientation (S,) xX Ld 
34 Spatial Orientation .50 47 
26 Identical Forms .40 —.05 
29 Punched Holes 36 389 
33 Circle Reasoning .19 .42 
32 Form Reasoning .09 32 


This factor is probably the same as that called ‘spatial relations,” S, , 
in the Army Air Force research (3). Beyond the first and third tests in the 
list, the factor loadings of the two solutions differ more than usual. Circle 
Reasoning might well be expected to have some of this spatial-factor variance, 
as it has in the Y solution. in Form Reasoning the order of the forms as to 
right-left is a feature of the items. There is no reason to expect variance of 
this factor in Identical Forms. From these points of view, the Y rotation 
comes closer to expectations. 

The factor 8, has come more and more to mean the ability to appreciate 
the spatial order or arrangement of objects, with the observer’s own body 
as the frame of reference (7). The name “spatial orientation” is used here in 
preference to “spatial relations” because it more aptly describes an adaptive 
ability. ae 








RUSSEL F. GREEN, ET AL. 151 


F. General reasoning (GR) xX ¥ 
9 Problem Solving 43 53 
3 Letter Triangle .36 33 

21 Ship Destination 33 47 
2 Figure Classification .32 04 
1 Sound Grouping .30 13 

25 Secret Writing 24 .30 

22 Syllogisms Tests 19 .30 
6 Essential Operations 14 34 

34 Spatial Orientation 04 .30 

27 False Premises .00 .30 


The two solutions agree as to the three leading tests on this factor. 
The factor is evidently reasoning I or the general-reasoning factor defined 
in the Air Force research (3). An arithmetic-reasoning test, like Problem 
Solving, was the chief defining variable. As usual, it has loadings in a rather 
wide variety of tests. 

One of the systematic differences between the two solutions is that the 
Y list includes tests such as False Premises and the Syllogism test that are 
distinctive of the group under factor G, defined as logical reasoning. In 
Zimmerman’s re-rotation of Thurstone’s data from his primary-mental- 
abilities study, he identified the general-reasoning factor but did not find 
any of the logical-type tests loaded on it (12). False Premises and another 
syllogism test were in Thurstone’s battery. Thus Zimmerman’s solution on 
the Thurstone battery is consistent with the X solution on our reasoning 
battery. 

This apparent separation of logical from non-logical processes in reason- 
ing as we have here in factors F and G, will be stressed in forming a new 
hypothesis concerning factor F, general reasoning. Comments from here on 
will be based primarily on the X rotations. 

Test 21, Ship Destination, was originally constructed as a measure of 
reasoning I (general reasoning) on the assumption that the ability is a matter 
of symbol manipulation. Some of the other tests on this factor, however, 
do not fit the symbol-manipulation hypothesis very well, for example, Figure 
Classification and Sound Grouping. 

It has been observed before (3) that reasoning I seems characteristic 
of tests whose items cover a wide range of difficulty. Tests that have rather 
homogeneous and low levels of difficulty are commonly lacking in this factor. 
It is true that some tests with wide range of difficulty do not have variance 
in this factor. Is there any difference between these tests and others graded 
in difficulty which do have variance in the factor general reasoning? 

For the most part, tests of graded difficulty that have no variance in 
this factor are of a more logical type. Most of the tests that appear on this 








152 PSYCHOMETRIKA 


factor involve the manipulation of abstract symbols that have little realistic 
meaning. As problems get harder, trial-and-error manipulation may reason- 
ably be assumed to become more important. This factor, then, may represent 
a less logical manipulation of symbols in a trial-and-error fashion. This 
would explain why difficult items in almost any type of test introduce some 
general-reasoning variance. This conception is very close to Heidbreder’s 
idea of “spectator behavior” (6). It may also be akin to Tolman’s concept 
of VTE (vicarious trial and error). Effective trial-and-error approach would 
presumably be most dependent upon the speed with which new solutions 
are tried and on the ease with which failing solutions are rejected. Such 
behavior is likely to be common to most of the leading tests in the list for 
this factor. 

Other hypotheses as to the nature of reasoning I can still be entertained. 
A hypothesis of symbolic span, that is, the ability to manipulate simul- 
taneously a large number of symbols or to apprehend a more complex pattern 
of symbols, has much merit. A speed-of-symbol-manipulation hypothesis 
could also be entertained, since effectiveness of trial and error would depend 
upon how efficiently the wrong trials are handled. 


G. Logical reasoning (LR) xX Y 
14 Inference Test 47 48 

22 Syllogism Test 47 voll 

27 False Premises 45 38 

20 Word Matrix .39 33 

9 Problem Solving 39 2 

6 Essential Operations 33 .28 

31 Symbol Manipulation 32 19 

23 Figure Analogies Completion 31 16 

13 Verbal Analogies I .28 .32 


This is probably the same factor that has been defined as deduction by 
other investigators. All three of the strictly logical, syllogistic type of tests 
are conspicuously loaded on it. 

Presumably the chief process in deductive reasoning is the drawing of 
inferences or conclusions. But note that the tests that have been used to 
define the so-called deductive factor have been of the multiple-choice or 
true-false form. The examinee does not have to draw his own conclusion. 
Conclusions are given to him and what he must do is to decide which one is 
correct. Deciding about the correctness of a conclusion may be a different 
psychological process than producing the conclusion. It may rather be an 
act of judgment or evaluation. The criterion of evaluation in these tests is 
that of logical necessity. We might describe this factor as a sensitivity to 
logical necessity. The name we have chosen, “logical reasoning,” is of broader 
connotation to allow for other possible descriptions. It remains to be seen 














RUSSEL F. GREEN, ET AL. 153 


whether deduction tests in the form of completion items will give rise to an 
additional factor distinct from this one. If they do, it will be interesting to 
see whether such a factor is identical with our factor K (eduction of cor- 
relates) to be discussed later. 


H. Eduction of perceptual relations* (PR) xX . 
19 Figure Matrix 41 46 

34 Spatial Orientation .38 12 

21 Ship Destination 36 14 

5 Figure Matching 35 .39 

8 Prescribed Relations .30 .32 

27 False Premises 32 21 


The leading test in this list, Figure Matrix, suggests that this factor is 
the same as the Air Force reasoning II, in which the same kind of test was 
also a leader (3). In the Air Force analysis a figure-analogies test was about 
equally loaded in the factor. In our battery we had no orthodox figure- 
analogies test, so we do not have this possibility of confirmation of the 
identity of the factor. Prescribed Relations, however, is a variant form of 
the figure-analogies test. 

Our main hypothesis II stressed seeing relationships, systems, trends, 
or patterns. The many tests that were thought to be related by communality 
in such a factor divided three ways in our analysis, factor H representing 
one of them. We then had the task of distinguishing the three groups of 
tests, under factors H, I, and J. Under factor H are perceptual tests, with 
the exception of False Premises, which has a loading of .32 in the X solution. 
The three tests on which the two solutions agree definitely involve the 
comparison of figures and the noting of relationships. We therefore name 
this factor ‘eduction of perceptual relations.’’ We would lay stress on the 
contrast between this group of tests and the next, under factor I. That the 
Figure Matrix test goes into the one group and the Word Matrix test, very 
similar in form, but differing in content, goes into another, is a most striking 
finding. | 


I. Eduction of conceptual relations (CR) xX 
4 Verbal Analogies I .38 

18 Number and Operations Changes III 39 

17 Number and Operations Changes II 34 

16 Number and Operations Changes I 34 

20 Word Matrix .32 

6 Essential Operations ol 


*In this and in three other factor names we have found it desirable to use the term 
“eduction,’”’ a term given prominence by Spearman in connection with his principles of 
cognition (8). 








154 PSYCHOMETRIKA 


This factor was identified in solution X only. It constitutes a rather 
meaningful picture, however, so we will attempt to interpret it. All the tests 
except possibly Essential Operations involve the grasping of relationships. 
In Verbal Analogies I an attempt had been made to emphasize this very 
process. Verbal Analogies II is not in the list, as we expected, since the 
seeing of relationships in it was made as easy as possible. Number and Opera- 
tions Changes I was not expected in such a list but it may require more 
relational thinking than was anticipated. Since the relationships involved 
in these tests, by contrast to those for factor H, are of verbal and numerical 
types, the factor has been named “eduction of conceptual relations.” 

This is the first instance that we know of in factor-analysis results where 
reasoning factors separate along the lines of the material content of the items 
as they do in factors H and I. This finding is contrary to the belief that 
reasoning abilities transcend the kind of material about which one reasons. 
It does, however, lend some support to the distinction sometimes made 
between concrete and abstract reasoning. On the other hand, none of our 
results supports the distinctions sometimes made between verbal reasoning 
and numerical or quantitative reasoning. Both types of relationships are 
involved in the tests loaded with this factor. 


J. Eduction of conceptual patterns (CP) xX Y 
33 Circle Reasoning 67 05 

9 Problem Solving .37 .24 

3 Letter Triangle 35 .28 

16 Number and Operations Changes I .09 .38 

17 Number and Operations Changes II .19 87 

18 Number and Operations Changes III .10 ol 


The Circle Reasoning test and the Letter Triangle test both require 
the examinee to find a rule or system. A seeing-rules or a seeing-systems 
hypothesis best fits factor J. It is probably the same as Thurstone’s induction 
factor, which he defines as the ability to see rules or principles. 

There are no strictly perceptual tests in the list, where properties of 
visual objects determine their relationships. The patterns conceived are 
conceptual rather than perceptual, hence the term “conceptual” in the 
factor name. Since we have proposed a distinction between seeing perceptual 
relationships and seeing conceptual relationships in factors H and I, one 
might expect by analogy two factors for the eduction of patterns. This is a 
possibility to be explored in future investigations. At any rate, there is some 
indication in factor J that the formation of patterns is something more than 
or is different from the eduction of relationships. 

The presence of Problem Solving in this list of tests is worthy of comment. 
It has been known that arithmetic-reasoning tests are factorially complex 








RUSSEL F. GREEN, ET AL. 155 


and that there is much true variance in Problem Solving still to be accounted 
for. Finding that an ability of educing a pattern is involved in this test is 
not unreasonable. Hypothesis Ic, defining problems (or structuring problems), 
was proposed as one conception of reasoning I, the leading component of 
the variance in that test. Structuring an arithmetical problem preparatory 
to its solution is a form of production of a conceptual pattern. One puzzling 
consideration, however, is the fact that Essential Operations, which was 
designed to examine hypothesis Ic, did not come out on factor J. It may be 
that the form of that test is not suitable for detecting variance in this factor 
and that some other form of item will be needed to isolate from the arith- 
metic-reasoning test that particular step of defining the problem. 

The two solutions are in disagreement over the presence of the three 
Number and Operations Changes on this factor. If the Y solution is correct, 
since equations are involved in these tests, it may be that grasping an equation 
as a structure is the key to the presence of this factor. Reorganizing an 
equation involves the formation of a new structure of a kind that is not unlike 
the structuring needed in arithmetic-reasoning items. 


K. Eduction of correlates (EC) XS a's 
24 Correlate Completion 41 82 

10 Vocabulary 41 .24 

15 Hidden Figures 40 41 

23 Figure Analogies Completion .32 .o7 

8 Prescribed Relations .30 .36 

20 Word Matrix .30 .40 

34 Spatial Orientation 5 38 


Three tests were constructed to see whether a factor of this kind would 
emerge: Figure Analogies Completion, Prescribed Relations, and Verbal 
Analogies II. The first two of these are in the list above. It is possible to 
explain why Verbal Analogies II was not. The other two are of the com- 
pletion type, in which the examinee has to produce an answer. Verbal Ana- 
logies II is a multiple-choice test in which answers are given. Correlate 
Completion is essentially an analogies test in completion form. It is probable 
that almost any form of analogies test that calls for the production of an 
answer will be substantially weighted with this factor, which we have called 
“eduction of correlates.” 

The presence of some other tests in the list calls for some attempts at 
explanation. Vocabulary tests are usually of complexity one for the verbal- 
comprehension factor, though some occasionally have secondary variances 
in reasoning abilities. The vocabulary test that we used presents each word 
to be defined in a brief context. It is possible that the context furnishes just 
enough in the form of relationships in some instances to make possible a 








156 PSYCHOMETRIKA 


correct answer when the word itself is unknown. Very liberal time was allowed 
for the examinees to complete this test, a condition favorable for the use of 
such secondary cues. 

There is no obvious explanation for the presence of the Hidden Figures 
test in this list. In the Air Force results it had shown an affinity for the test 
Figure Matrix, but in this analysis it separated from that test. In Thurstone’s 
analysis of perception, this test helped to define a new factor defined as the 
ability to effect closure against distractions (10). That factor accounted for 
only a portion of its true variance, however, so there is room for other variance. 
It is not likely that our factor K is the same as Thurstone’s closure factor. 

It should be mentioned that in the X rotations there seemed to be 
some genuine correlation between factor K and factor G, logical reasoning. 
Such a correlation could arise from the fact that no judgment as to the logical 
soundness of a conclusion can be made unless a conclusion has been educed. 
It is possible that an inference test in completion form would measure this 
factor of eduction of correlates. 


L. Symbol substitution (SS) xX : 
16 Number and Operations Changes I 39 .32 
17 Number and Operations Changes II 34 .29 
18 Number and Operations Changes III 34 .24 
32 Form Reasoning 304 —.02 
25 Secret Writing .3d ol 
31 Symbol Manipulation .28 .93 


The most obvious thing that these tests have in common is the need for 
substituting one symbol for another. It is therefore hypothesized as a sym- 
bol-substitution ability. In every one of the tests in this list, there is a 
substitution of symbols according to rules. The symbols substituted take 
on the meanings or the functions of the symbols they replace. 

The factor might be of a more general nature than is demonstrated in 
these results. It might include all thinking in which new symbols are assigned, 
whether they replace old ones or not. Further research will be needed to 
examine the generality of this factor. It will be of interest, too, to determine 
relationships of this ability to performance in certain branches of mathe- 
matics and in symbolic logic. 


M. (Unidentified) wy, 
32 Form Reasoning .49 
18 Number and Operations Changes II .28 
30 Perceptual Speed .26 


No hypothesis regarding the possible nature of this factor is offered. 
might possibly be the factor that Blakey had found defined uniquely by 


~ 


J 








RUSSEL F. GREEN, ET AL. 157 


the Form Reasoning test. It seems likely, however, that the variance given 
this factor by the Y solution belongs in factor L. The facility in switching 
symbol meanings is an obvious feature of this test. 


Discussion 


An attempt will now be made to relate the factors to the hypotheses 
set forth in preparation for this study and to previous factorial results. 
There are some general issues, also, that call for comment. 

Among the twelve obtained factors, seven are common to tests that 
were designed as reasoning tests. Whether all of these should be regarded 
as reasoning abilities is a matter of definition of the domain of reasoning. 

Factors were found corresponding to the four major hypotheses but not 
on a one-to-one basis. The well-established general-reasoning factor (factor 
F) was substantiated and new hypotheses concerning its properties that 
should be fruitful for future research were mentioned. 

The area described as reasoning II (in general, the inductive area) 
seems to require three dimensions to describe it factorially. These are factors 
H, 1, and J, which involve educing relations and patterns. 

No factor was found that could be described as a classifying ability (hy- 
pothesis III). The classification tests had no common factor unique to them. 
Since non-reasoning factors account for the larger part of their variances, it 
would seem that classifying activities are not very much dependent upon 
reasoning. An exception to this is the involvement of general reasoning and 
this is probably true only when the classifying task becomes difficult. It 
appears that classifying tests have almost nothing to offer in the way of 
measurement of reasoning abilities. 

To meet the fourth major hypothesis, we found a factor that has formerly 
been called “deduction,” but which in our opinion should be called “‘logical 
reasoning.’ It is suggested that this is an ability to evaluate inferences and 
may not be the same ability at all as that for drawing conclusions for one’s 
self. 

Next will be considered briefly which of the initial minor hypotheses 
were upheld and which ones were not, taking each of the four groups in turn. 
Some can be regarded as favored by the results, others as candidates for 
discarding. Most of them need further investigation. 

No results particularly favored hypotheses Ia, manipulating symbols, 
Ib, solving problems, or Id, testing hypotheses, as definitions of reasoning I. 
If the factor called “eduction of conceptual patterns” is sustained as defined 
here, reasoning I is not a matter of hypothesis Ic, defining problems. We 
had no good tests in the battery for testing Ie, organizing a sequence of steps. 
The two tests in which such a factor might have emerged separated in the 
analysis. This negative finding is not sufficient reason for rejecting this 
hypothesis without further investigation. 








158 PSYCHOMETRIKA 


Our factor J, eduction of conceptual patterns, is essentially in line with 
the idea of hypothesis Ila, seeing rules or principles. This kind of pattern is 
meaningful and it may be based upon various kinds of material—forms, 
letters, or numbers. The distinction between a rule or principle, Ila, a system, 
IIb, or a trend, IIc, may be actually insignificant from a psychological point 
of view. We had in the battery no really discriminating tests which would 
have made possible a clear separation of such abilities, if they are separate. 
The tests designed to examine the hypothesis of seeing (educing) relations, 
IId, divided into at least two groups—those involving figures in one group 
and those involving numbers and words in the other. The consequence was 
the identifying of two factors, H and I, educing perceptual relations and 
educing conceptual relations. Finally, there were no tests sufficiently unique 
to do justice to the hypothesis of seeing identity of relationships, Ile, and no 
factor, even the one appearing in the test Hidden Figures, could be identified 
as IIf, analyzing forms. 

As was stated above, no classification factor of any kind emerged. We 
failed, then, to find any support for hypotheses IIIa, seeing common elements 
or properties, IIIb, classifying in general, or IIIc, classifying forms. This 
_might be because seeing identities is a matter of seeing relationships, in the 
| sense that identity is a limiting case of relationship. The failure to find a 
factor of this kind is somewhat curious in that traditional psychology has 
made so much of the processes of abstraction, equivalent stimuli, and transfer 
by reason of ‘‘identical elements.” A factor that could be described as IIId, 
eduction of correlates, was demonstrated. It did not occur in classification 
tests. It did occur to best advantage in completion tests rather than in 
multiple-choice tests. It is possible that if any factor is to receive the label 
“deduction” this should be it. It is suggested, however, that the term ‘‘de- 
duction” be dropped as a psychological concept. 

The tests designed to examine hypothesis IVa, drawing inferences 
(deduction), would have led to the support of this idea over that of IVb, 
syllogistic reasoning, which restricts the supposed ability to formal, syllo- 
gistic tasks. The reason is that all the tests, formalized or not, came out 
together on a factor. The factor was not designated as deduction, however, 
but as “‘logical reasoning’’—for reasons cited under factor G. 

One obtained factor was not anticipated by any of the initial hypotheses, 
that of ‘‘symbol substitution.”’ Although the facile attachment and replace- 
ment of symbols in connection with ideas is undoubtedly a feature of steps 
in the solving of certain types of problems, such activity would hardly be 
described as a species of reasoning. It would, however, come under the general 
concept of thinking. 

A question of a more general nature that we hoped would be answered 
to some degree by this study should be mentioned. It appears that the 
reasoning factors do not always transcend material lines. There is no evidence 








RUSSEL F. GREEN, ET AL. 159 


of a distinction between verbal and numerical reasoning, but there is evidence 
of a distinction between concrete and abstract reasoning in the two factors 
eduction of perceptual relations (concrete reasoning) and eduction of con- 
ceptual relations (abstract reasoning). That is as far as the distinction goes, 
so far as our evidence is concerned. We may be justified, however, in looking 
for a factor of “eduction of perceptual patterns’ as a possible parallel to the 
obtained factor of eduction of conceptual patterns. 

Which of the reasoning factors are identifiable with those previously 
found? In answering this question, we will consider only the major studies 
that were given most attention in the planning of this investigation. General 
reasoning corresponds to Thurstone’s restrictive reasoning and the Army 
Air Force reasoning I. Logical reasoning parallels Thurstone’s and Zimmer- 
man’s deduction, but a significantly different interpretation has been given 
to it. 

Thurstone’s induction factor may either be regarded as having been 
analyzed in three new dimensions—eduction of perceptual relations, eduction 
of conceptual relations, and eduction of conceptual patterns—or it may be 
regarded as the same factor as the last of these three. The latter also has 
some resemblance to the Air Force reasoning III. The eduction of perceptual 
relations and the eduction of conceptual relations have no exact antecedents 
in previous factorial results, except that the former is probably equivalent 
to the Air Force reasoning II. 

Eduction of correlates and symbol substitution, as defined, have no 
parallels in previous factorial findings. Eduction of correlates comes closest 
to what should be called deduction, but it would be a somewhat special 
case of deduction found in reasoning by analogy. It is our present. belief 
that a genuine deduction factor has not been found and that if it exists it 
will take completion tests to find it. 

In closing, we give passing attention to the problem of the definition of 
reasoning. We have been referring to several factors as reasoning factors. 
Obviously, whether or not any factor is a reasoning factor depends upon the 
definition of reasoning. We actually started the study with two definitions. 
One was a logical definition that referred to thinking activities directed 
toward solving unfamiliar problems. The other was an operational definition 
that referred implicitly to a collection of tests constructed as reasoning 
tests. We now have a third possibility. We can define reasoning in terms of 
the factors obtained from analyses such as this one. Having decided upon 
which factors should be in the list, we could then frame a logical statement 
to cover them. This statement might serve as a new definition of reasoning. 
It is our belief that if reasoning is to be defined, these steps will eventually 
give us an acceptable, operational definition, with much stability and de- 
pendability. We do not feel that the time is ripe for this until more is known 
about the factors. 








160 


fe) 


10. 


11. 


12. 


PSYCHOMETRIKA 


REFERENCES 


. Blakey, R. I. A factor analysis of non-verbal reasoning tests. Educ. psychol. Measmt. 


1941, 1, 187-198. 


. Green, R. F. A factor-analytic study of reasoning abilities. Ph.D. dissertation, Uni- 


versity of Southern California Library, 1951. 


. Guilford, J. P. (Ed.) Printed Classification Tests. Army Air Forces Aviation Psychology 


Reports, Report No. 5, Washington, D.C.: Government Printing Office, 1947. 


. Guilford, J. P., Comrey, A. L., Green, R. F., and Christensen, P. R. A factor-analytic 


study of reasoning abilities, I. Hypotheses and description of tests. Reports from the 
Psychological Laboratory, No. 1. The University of Southern California, 1950. 


. Guilford, J. P., Green, R. F., and Christensen, P. R. A factor-analytic study of reason- 


ing abilities, II. Administration of tests and analysis of results, Reports from the 
Psychological Laboratory, No. 3. The University of Southern California, 1951. 


. Heidbreder, E. An experimental] study of thinking. Arch. Psychol., 1924, 11, No. 73. 
. Michael, W. B., Zimmerman, W. S., and Guilford, J. P. An investigation of two 


hypotheses regarding the nature of spatial relations and visualization factors. Educ. 
psychol. Measmt., 1950, 10, 187-213. 


. Spearman, C. The abilities of man. New York: The Macmillan Company, 1927. 
. Thurstone, L. L. Primary mental abilities. Psychometric Monographs No. 1. Chicago: 


Univ. Chicago Press, 1938. 

Thurstone, L. L. A factorial study of perception. Psychometric Monographs No. 4. 
Chicago: Univ. Chicago Press, 1944. 

Zimmerman, W. S. A simple graphical method for orthogonal rotation of axes. Psycho- 
metrika, 1946, 11, 51-55. 

Zimmerman, W. §. The isolation, definition, and measurement of spatial-visualization 
abilities. Ph.D. Dissertation,' University, of Southern California Library, 1949. 


Manuscript received 9/5/52 


Revised manuscript received 11/7/52 











PSYCHOMETRIKA—VOL. 18, No. 2 
JUNE, 1953 


A METHOD FOR FACTORING LARGE NUMBERS OF ITEMS 


RoBERT J. WHERRY 
THE OHIO STATE UNIVERSITY 
AND 
BEN J. WINER 


UNIVERSITY OF NORTH CAROLINA 


The computation of intercorrelation matrices involving large numbers 
of variables and the subsequent factoring of these matrices present a formi- 
dable task. A method for estimating factor loadings without computing the 
intercorrelation matrix is developed. The estimation procedure is derived 
from a theoretical model which is shown to be a special case of the multiple- 
group centroid method of factoring. Empirical checks have indicated that the 
model, even though it makes some stringent assumptions, can be applied to a 
variety of variables found in psychological factoring problems. It has been 
found to be particularly useful in factoring test items. 


Historical Introduction 


While some variant of the Thurstone group factoring method is quite 
satisfactory for factoring 15 or 30 or possibly even 50 tests or items, the 
method involves intercorrelation and residual tables of order n’, i.e., for 
200 items such tables would have 40,000 entries each. Thus large numbers 
of items make some other approach desirable if not absolutely necessary. 

Wherry and Gaylord in 1943 (10) suggested an iterative approach, 
based upon successively determined r;, values, for factoring items. This 
iterative approach was used successfully in several major studies. Wherry 
(9) factored 292 rating scale descriptive phrases concerning Army officers 
in 1950. The iterative approach was also used in three doctoral disserta- 
tions at Ohio State University: Gordon (3) factored 300 personality test 
items in 1950; Phelps (7) factored 114 need-activity items in 1951; and 
Lucas (5) factored 90 need-satisfaction items in 1951. 

At the American Psychological Association meetings in Chicago in 1951, 
Loevinger, et al. (4) and Gleser, et al. (2) presented a slightly different 
iterative approach, which involved the obtaining of the intercorrelations 
among items within various subtests in addition to several refined methods 
for testing and refining the clusters. Final clusters were apparently left 
oblique although intercorrelations were minimized. 

Wherry, Perloff, and Campbell (11) showed that item factors obtained 


161 








162 PSYCHOMETRIKA 


by the Wherry-Gaylord approach were the same as those obtained by factor- 
ing the intercorrelations among thirteen arbitrary subtests set up by expert 
opinion. The thirteen subtests had been selected in an attempt to speed 
up the original Wherry-Gaylord method, but had iterated back into only 
three patterns. The present paper grew out of further consideration of prin- 
ciples presented in that paper. 

In the autumn of 1950, two large item-factoring problems were begun 
at Ohio State. The Personnel Research Board was preparing to factor 120 
leadership description items. Wherry was preparing to analyze 300 teacher 
description items. The present method was worked out to achieve common 
factors without iteration and at the same time preserve any specific factors 
which might be found in any of the expert-constructed a priori subtests. 
A direct method of factoring items was worked out at that time by analogy 
with the traditional group factoring method for tests. 


The Direct Method of Factoring 


In the standard multiple-group centroid analysis, the table of inter- 
correlations of the basic variables is partitioned into groups which have 
relatively high within-group correlations and relatively low correlations 
with variables in other groups. Each of the groups (A, B, --: , A, --- , M) 
defines a centroid reference vector in an oblique reference frame. These 
centroid reference vectors will be designated X, , X,,°::,Xxn,°':,Xwm.- 
The matrix of oblique factor loadings is obtained from projections of the 
variables onto these centroid vectors. For those variables belonging to an 
arbitrary group K, the projections on centroid vector Xx would be given by 





i= Ss — += jinK;i 9, (1a) 


where the notation S, indicates the sum of all elements in the matrix of 
intercorrelations of the variables in group K excepting the elements of the 
form r;; . For variables not in group K, the projections on centroid vector 
Xx are given by 
ae es 
28° OSS, (i not in K;j in K). (1b) 
v> hj; + Sx 


‘The cosines of the angular separations of the centroid vectors are given 
by 


x > Vii 
aie i>. hi, 4+ Be Vy hi; + S; ; 





(«in K;jin L). (2) 








ROBERT J. WHERRY AND BEN J. WINER 163 


The process of transforming projections from oblique to orthogonal reference 
frame can readily be interpreted as a part-correlation procedure. In carrying 
out the process, one selects an arbitrary vector as the pivot; after the pivot 
vector has been selected, the order in which the correlation between vectors, 
represented by the cosines of the angles between vectors, is ‘‘parted out” 
is arbitrary. The various solutions obtained by selecting an arbitrary vector 
as a pivot and arbitrary order thereafter are equivalent in the sense that 
one can be rotated into any other. 

Let us designate the orthogonal reference axes obtained by the trans- 
formation process as X;, X;;,°+:,X,, °°: , X,,. Suppose we select reference 
vector X, as the pivot vector. The projections of the variables on the ortho- 
gonal axes could then be expressed by the following part-correlations: 


Vir = Via ’ 

Terr = Typ) » 

Tirtt = Tice-AB) 5 

Tim = UicMeABCees(M-=1)) 9 


where the notation 7;;;..:) designates the part-correlation of variable 7 with 
variable j (i.e., the correlation of variable 7 with that part of variable j which 
is independent of variables k and 1). A modified Doolittle solution applied 
to the matrix of the intercorrelations of the centroid reference vectors pro- 
vides a convenient method for computing the matrix for effecting this 
transformation. This method is outlined in a later section of this article. 


An Indirect Method for Estimating Factor Loadings 


In equations (la) and (1b) it will be noted that only the sums of subsets 
of the r;; in the intercorrelation matrix enter into the computations. It 
is the purpose of the indirect factor method to obtain direct estimates of the 
sums needed in equations (la) and (1b) which do not involve using the 
individual r;; . Since this method of factoring has been found to be particu- 
larly useful in working with test items, it will be described in the item- 
factoring setting. If the assumptions underlying the estimation processes 
are met, the procedures to be described are, however, quite general in appli- 
cation. 

One starts the analysis with a large pool of items (e.g., 200 items). 
The items are grouped into subtests on the basis of expert opinion. Fortu- 
nately this expert opinion need not be too expert. One necessary restriction 
is that the experts do not achieve the rather impossible task of creating 
completely alternate forms. Since the purpose of the grouping of the items 
by the experts is the exact opposite of this condition, subtests resulting 
from areas of agreement among the expert judgments will generally prove to 


la. 


ete ¥. &: 


3 = 


Fa} 








164 PSYCHOMETRIKA 


be satisfactory as a starting point for the analysis. A second necessary con- 
dition is that the number of subtests set up be at least as great as the rank 
of the matrix of inter-item correlations. 

All items are then administered to a population of N persons (preferably 
at least 100), whose responses are to be analyzed. All papers are scored on 
each of the established subtests and the matrix of intercorrelations between 
the subtests computed. In addition one computes the correlations between 
each item and each of the subtests.* With limitations to be indicated, these 
steps provide a set of data directly analogous to that required by the multiple- 
group factor method. Designating the correlation of an item with an arbitrary 
subtest by r;x- and the correlation between any two subtests by rx-z, , we 
have 


TiK’ = TiK 5) (4) 


IIe 


TK'L’ TKL + 


In equation (4) the subtests are considered to define vectors analogous to 
centroids of groups. Converting the oblique subtest vectors into orthogonal 
coordinate axes, we have 


ee ee eee ee (5) 
Tim = Ti(m-ABe+s(M-1)) SS Tic’ +a’ B!ooe(M=1)") © 
By means of equations (5) the correlations of items with correlated sub- 
tests are converted into factor loadings on a set of orthogonal reference 
vectors. The latter vectors can then be rotated to psychological meaning- 
fulness. 

Equations (4) and (5) are based upon the assumption that the vectors 
defined by the scores on the subtests are satisfactory estimates of the centroid 
vectors that would be defined by the centroids of the corresponding inter- 
item correlations. One apparent limitation of this assumption is that r;x- 
would reflect a spurious effect contributed by the correlation of an element 
with a total of which it is a part and hence include the contribution of a 
specific factor. However, even with very crude corrections for this fact, 
the direct use of equations (4) and (5) yielded highly meaningful factors and 
loadings in the studies which have been indicated above. Methods for elim- 
inating this source of spurious correlation are developed in what follows. 

Further considerations show two not so apparent limitations in the 
estimation procedure. For those items not in subtest K’, r;x- will in general 
be an underestimate of r,x , since items that might belong to K, but not 

*For purposes of the present development we assume all correlations to be product- 
moment. In actual applications of the method, tetrachorics are recommended. A fuller 
— of type of correlation coefficient appropriate for the analysis appears in a later 








ROBERT J. WHERRY AND BEN J. WINER 165 


included in K’, will not be properly weighted in defining the vector Xx . 
There is also a bias in rx, as an estimate of rg, , since the intercorrelations 
of the subtests will in general be underestimates of the correlations between 
groups in the common-factor space. The following sections point out the 
origin of these biases or spurious elements in the estimates and develop 
adjustment procedures whereby statistics based upon the subtests more 
closely approximate those based upon the actual centroids. 


The Transformation of Item-subtest Correlation Coefficients into 
Projections on Group Centroid Vectors 


Case 1. Items in Subtest 
The product-moment correlation of an item with a subtest of which it 
is a part is given by 


o: + a. Oi7 i; 
a i 
7 vs. o; +> pS OO ;7 5; , 
If we assume that the standard deviations of the items are equal, we have 
1+ ps Ys; 
Vag + Se 


It will be noted that equation (7) has the same form as equation (la). In 
order to express 7;x as a function of r;x- , we note that 


ox = de o; + bm } O;0;7;; - (8) 
If we assume o; = o; = G,; , equation (8) becomes 


Cn = o(nk: + Sx). (9) 


(6) 








Tir’ 








Unt 


Substituting from (9) into (7), we have 


1+ Dor 1+ Doris 








TK = Wa (10) 
ox: /G; ax: , 
where dx = ox/o; . From (9) we also note that 
Sr: — as —- Nk = Gee — Nk. (11) 
0; 


Assuming Sx, equal to Sx , we can substitute in (la) to obtain 
his + fo Vij 


YiK = a a peers ara z 12) 
Vax + > (1 — hi) ’ 





‘eae 


IP3rt A? 


tras 








166 PSYCHOMETRIKA 


If now we solve equation (10) for 2; 7;; and substitute in equation (12), we 
have 
Tixdx — (1 — hi,) 


Vik = : . 
Tiles = aE ey 








(13) 


or computationally 





(rope —-1)+ Wie ; (13a) 
Vax — ne) + Di 
If the items in a subtest are factorially homogeneous, the average corre- 
lation of a given item with all other items in the subtest can be used as 
a first estimate of its communality. Assuming 2 hj; = Sx:/(ng- — 1) equa- 
tion (13) becomes upon simplification 





“x = 


Nixa — (1 — hii) = (14) 


Tix = 








Nr 2 
—— ar? — eee 
\ Nine — ] ( K K ) 


In order to estimate the value of a single communality, let us assume 


:® Vii 
h?, = —— . 
Nk = 1 


Under this assumption, equation (10) becomes 


— Lt (me = hii | (15) 


Tix = 
K ee 
Solving (15) for h?; , one obtains 


ifn (16) 


So 
an — 





Substituting from (16) into (14), we have after simplification 


Tin/Qx — 1 e 

TK == | Lx = . (17) 
Nn, — 
Vong (ae — Me’) 


NK 








Equation (17) admittedly involves several assumptions and approxi- 
mations, but it does provide a useful tool for getting first approximations 
to r:x . By means of the relationship 


2 2 
tix — hi; , 


equation (13) can then be used successively as an iterative device to secure 
better estimates. Iteration can be concluded when none of the values change 
materially (say not more than .01) after iteration. 








ROBERT J. WHERRY AND BEN J. WINER 167 


A numerical example will serve to illustrate the technique. Suppose 
that we have a subtest of six items, each with a difficulty of .50 and with 
unknown factor loadings of 


Item 1 2 3 4 5 6 
Loading .10 .30 .40 .50 .60 .80 


The item intercorrelations, also unknown, would be 





Item 1 2 3 4 5 6 z= 
1 1.00 .03 04 05 .06 .08 1.26 
2 .03 1.00 .12 15 18 24 1.72 
3 .04 |) 1.00 .20 24 32 1.92 
4 .05 15 .20 1.00 .30 .40 2.10 
5 .06 18 24 .30 1.00 48 2.26 
6 .08 24 32 .40 .48 1.00 2.52 





>> 1.26 1.72 1.92 2.10 2.26 2.52 11.68 


The following data about the subtest would be observed: 


Ne = 6: Mx: = 3.00; ox: = 2.92; and 


~ 


Item 1 2 3 + 5 6 
TiK? .37 .50 .56 .61 . 66 74 
From the information contained in these observed data, we can deduce the 
unknown factor loadings. 
Assuming the items are dichotomous, we would have 
NK’ DR? = Myx: ; (18) 
where px: is the mean difficulty. For the numerical example 
Px = 3.00/6 = .50. 


Then from the equation 
o; = p(l — p) (19) 


we obtain 
o; = (.5)(.5) = .25, 


whence 
Ox: = ox: /o, = 2.92;.25 = 11.68. 


Either equation (17) or equation (13) can now be used to obtain a first 
approximation to r;x . If equation (13) is used, rx, can be used as an esti- 
mate of h?; . Let us designate this first approximation to r,;x by either equation 
(17) or (13) as ,r:x . More exact approximations can be obtained by using 
equation (13) iteratively. The results of continuing this iterative process 


PIMA 


'7t 2 








168 PSYCHOMETRIKA 


through three steps for the numerical example under consideration are given 
in Table 1. 


TABLE 1 
Illustrative Example of Iterative Procedure 


(nx = 63 ax = +/11.68 = 3.418) 

















Qn-Tixe — 1 75, ss | Nes Mike IK Wik wiK TiK 

1 0.265 .137 .021 .011 .87 «6.145 «=.106 =. 108 .10 

“4 0.709 .250 .119 .095 .50 .345 .308 .300 .30 

3 0.914 014 .195 .171 .56 .441 .413 .405 .40 

4 1.085 .of2 .245 .256 .61 .524 .506 .500 .50 

5 1.256 .4386 .370 .366 .66 .608 .605 .605 .60 

6 1.529 .548 .558 .605 .74 .747 .778 .796 .80 
2 2.057 1.538 1.504 

D? = (ax — ng) + Dh?; 

D? = 5.680 + Sh?, = 7.737 7.218 7.184 
1/D = .8095 .3722 .3731 





In this example, the iterative process converges quite rapidly; after 
three steps all estimates are within .005 of the theoretical values. It should 
be noted that this iterative procedure is analogous to the usual iterated 
centroid method for stabilizing the estimates of the communalities and 
reduces to it if all assumptions implicit in equation (13) are met. 

The iterative procedure outlined here can be carried out simultaneously 
for several groups and convergence is usually quite rapid. The various con- 
ditions under which convergence is not rapid or in which the convergence 
is toward incorrect values has been investigated (12). It was found that 
only in extreme cases not usually encountered with ordinary test data will 
the iterative process fail to converge to correct values. The values of the 
communalities computed in the manner described are, of course, approxi- 
mately equal to the square of the projections of the items on the centroid 
of the subtest containing the items. Additional increments to the communality 
of the items are obtained from projections of the items on the centroids of 
subtests which do not contain the items. 

It should be recalled that the method used in this section applies only 
to items included in subtests and assumes that items have approximately 
equal variances within any subtest. 


Case 2. Items Not in The Subtest 


The adjustments necessary to obtain 7;x from r;x- for items falling under 
Case 2 are simpler than those made for items falling under Case 1. For 








ROBERT J. WHERRY AND BEN J. WINER 169 


items not included in subtest K’, the value of r;x is given by equation (1b). 
Under the assumption of equal item variances for items in subtest K’, we 


have 
0; a Ts; pa ts; 
Nix = . = — (i not in K’;7 in K’). (20) 


OK: Ax: 





If now we substitute in equation (1b) the value of >>; 7;; given by equation 
(20) and also for the denominator of equation (1b) the expression obtained 
from equation (11), we have 








Qk TK: ‘i ° 
tix = = - (i not in K’) 
“Va = (1 = hi) 
= Cx ix: ’ (21) 
where 
Cr: 








— ax: 
Vax — 2 (1 — hi)’ 
or computationally 


1 . 
i n—- Dh 
~~ = Tae 
Qrx: 


The denominator of the adjustment factor, Cx, , will have been computed in 
the last step of the iteration process for the items in subtest K’. Thus all 
of the values on the right side of equation (21) are known. 

To illustrate the application of equation (21), suppose in the numerical 
problem presented under Case 1, we also have items 7 through 10 which are 
not included in subtest K’. Suppose the wnknown theoretical factor loadings 
for these items on the same factor considered above to be 





Cre = (21a) 





Item ii 8 9 10 
Loading .00 .20 .50 .70 


The unknown theoretical intercorrelations would be 








Item 7 8 9 10 
1 .00 .02 .05 .07 
Z .00 .06 15 Zi 
3 .00 .08 .20 .28 
4 .00 .10 By: .30 
5 .00 12 .30 .42 
6 .00 .16 .40 .56 
»2 .00 .54 1.35 1.89 


L22tet ae? 2 PIPL ATVI A 


i s* BE 


ee PR, FS SSS 








170 PSYCHOMETRIKA 


The observed r;x, values would therefore be 


Item 7 8 9 10 
TK’ .00 .16 .395 .55 


We would therefore have 
Cx = 3.418/2.679 = 1.276. 


From equation (21) we could compute the following values of r,x : 


Item 7 8 9 10 
lik .000 . 204 .504 . 102 


It will be noted that r;x, values for items in Case 2 are underestimates 
of the corresponding 7; . In general the value of Cx, in equation (21) will 
be greater than unity. On the other hand for items falling in Case 1, 7,x- 
overestimates the relatively low 7;< and underestimates the relatively high 
r;x . One would expect the errors of estimation to be in this direction, since 
the contribution of the specifics would be relatively greater for those items 
having small r;x values. 


Transforming Correlations between Subtests 
into Correlations between Factors 


While the procedures developed so far have enabled us to estimate the 
r;x from the r;x , we still need to convert correlations between subtests, 
rx: , into cosines of the angular separations of the centroid vectors Xx 
and X, or, what is equivalent, into the correlations between subtests in the 
common-factor space, rx, . The correlation between subtests K’ and L’, 
assuming all items within each subtest have approximately equal difficulties 
(but all subtests need not necessarily have the same difficulty), we have 


a > 


Try = 
on (ox: /o;)(o1/6;) 
LL  GinksjinL). (22) 
ax Ay: 


The correlation between these two subtests in the common-factor space 
would, however, be 





a oS 7 —— (i in K’;j in L’). (23) 


TRL — a —_<<_<II<<_<o a —— 
2 2 2 2 
V ax = > (a aa h;;) Vaz: Sd >» (a ee h;;) 
Dividing numerator and denominator of equation (23) by ax-az,, , we obtain 


KL! 





Ped) fa hi) 
yy i— ———- = 1 — a aie 


KL 


| 


= Cx Cy rr, ) (24) 








ROBERT J. WHERRY AND BEN J. WINER 171 


where Cx, and C,, are adjustment factors that will have already been com- 
puted in the process of working with equation (21). 

In order to obtain an estimate of rx, prior to the adjustment of the 
rx and r;, , a slightly less exact variant of equation (24) can be obtained. 
Using the same logic that led to equation (14), we can write 


, 





YRK'L 


SD laa (SS a ss 
.. ae (1 e ne) ie... sae (1 ie nz) (25) 
Vax: — | ak: Vay — 1 ai: 


By using equation (25) as a first approximation to rx, , one can employ the 
Doolittle process to be described in the next section in order to check the 
possibility that the subtests are linearly dependent. (A necessary condition 
for groups in the multiple-group centroid method is that they be linearly 
independent.) Using the Doolittle procedure to be described, but without 
the horizontal extension, it will be possible to discover which subtests have 
leading A-row entries Iess than say .10. Since most of the subtests that 
fall in this latter category can have the largest part of their variance satis- 
factorily accounted for by other subtests, in order to stabilize the inverse 
of the matrix that must be inverted these subtests are eliminated from 
further consideration. After r;x, values have been adjusted, one should 
recompute the rg, from the more precise estimate given by equation (24). 
The values given by equation (24) should be used in obtaining the trans- 
formation matrix described in the next section. 


A Method of Securing the Transformation Matrix for Converting Projections 
on Oblique Reference Vectors to Projections on Orthogonal Axes 


We now have information identical with that obtained in the regular 
group factor method. The explanation given by Thurstone for obtaining the 
transformation in terms of matrix manipulation can be clarified and perhaps 
made easier to follow when the procedures are viewed in terms of the part- 
correlation coefficient. 

Dunlap and Cureton (1) give the formula for the part-correlation 
coefficient as follows: 


Tin — Tialars 
Penh Se UBC), — gg ee (26) 
This formula can be rewritten as 
Tae 1 
ar = (aay = rul > ——— ) + rea wa =, : (26a) 
— Tap = Tae 


It will be recalled from an earlier section that 


Cr =lia = r;a(1.000) oa r;p(.000). (27) 








AB? 279F 


= 


Firrire 


217 #127 BEBE 


es Ph, FS 





172 PSYCHOMETRIKA 


These last two equations can be expressed in matrix form as follows: 








Tia Tis Tir Tun 
T2A ToB [ —Fap Yer Terr 
ee 
V1 — Fae 
Yi Yis 000 1 Vir Yin 
é == 
g vil B | 
ie pe Tap Ll nt Tarts 











Or, if we designate the matrix of projections on the oblique axes by P and 
the transformation matrix by 7', we have 


PT = FP, (28) 


where F is the matrix of projections on orthogonal axes. It is our purpose 
to show that the transformation matrix 7 can be built up by a successive 
part-correlation procedure. 

We start with table of intercorrelation of the factors, thus 


A B 
A 1.000 Yas 
B ap 1.000 


To this we append, on the right, a diagonal matrix with — 1.000 values in the 
diagonal, thus 


A B =f —Il 
A 1.000 Tap — 1.000 .000 
B TAB 1 .000 .000 ae 1 .000 


Performing the usual Doolittle operations on this last expression yields 
in literal form 























A B me aah 
A(A) 1.000 Tap — 1.000 000 
A(R) —1.000 —Tap 1.000 000 | 
B(1) 1.000 000 — 1.000 
B(2) = rs B | TAB .000 
B(A) L ~ihy | Tap —~1.000 
= | eon! —_. 1 
B(R) 1.000 | ae <7 

















ROBERT J. WHERRY AND BEN J. WINER 173 


In this schematic Doolittle, the notation X(A) indicates a row obtained by 
addition; the notation X(R) indicates a row obtained by multiplying by a 
negative reciprocal. If one forms the matrix # by using the rows in the 
X()-row entries of the extended portion of the Doolittle table (in box 
above) as columns in the matrix E, one would have 


F igo —ras/(1 — | (29) 
.000 1/(1 — ris) 


Comparison of the matrix HE with the matrix T will show that they are 
identical except for the denominators in the second column. To complete 
the identity, it will be necessary to set up a diagonal matrix, D, by using 
the square roots of the leading X(A)-row entries in the Doolittle as diagonal 


entries, thus 
dics ie 000 000 | (30) 
000 V1 — rhs 
We now have the relationship 
ED = T. (31) 


Problems involving more than two factors can be solved in identical 
fashion by the inclusion in the extended portion of the Doolittle of as many 
entries as there are factors. The X(#)-row entries of this extended matrix 
will form the columns of the E matrix. The square roots of the X(A)-row 
leading entries will form an expanded diagonal matrix D. The expanded 
transformation will be given by equation (31). The proof that the elements 
in the matrix product PT are actually part-correlation coefficients can be 
readily established by writing out the complete literal solution for the general 
case. 

One word of warning appears necessary. If any X(A)-row entry begins 
to approach zero (say becomes .20 or less), the probability is that only error 
remains (if reliability is greater than .80, the remainder could be lower). 
Any variable for which this happens should be dropped throughout the entire 
Doolittle, since all of its real variance is already accounted for by other 
tests already included. 


Use of Tetrachorics in Place of Product-moment Correlations 
in Factoring Items 


In the development presented so far, it has been assumed that the 
matrix of inter-item correlations to be factored has as elements product- 
moment correlations. Equation (7), which is basic to all of the development, 
holds only for product-moment correlations. However, because of the dicho- 
tomous nature or highly skewed distribution of responses in many types 


pecs seme rcstri s Sees RESPFLI LEFT RSI £ FLT LArzir = 
: % = . 








174 PSYCHOMETRIKA 


of items, it would be more appropriate empirically to take as the starting 
point for the analysis a matrix of inter-item tetrachorics. Wherry and Gay- 
lord (10) have summarized the major arguments for the use of tetrachorics 
in factor analysis of items. In order to use tetrachoric correlations as the 
starting point, certain modifications in the procedures developed above are 
necessary. 

The adjustments to be described in this section are needed only when 
iterative corrections to the communalities are made. In the practical applica- 
tions of the method this iteration is not actually necessary if the number 
of items in a subtest is ten or more. Before developing the basis for these 
adjustments let us examine the tetrachoric function. It is derived from a 
normal bivariate distribution and therefore assumes that the underlying 
variables are each continuous and normally distributed. Implicit also is the 
assumption that the regression is linear. If the variables satisfy these con- 
ditions, the product-moment correlation and the tetrachoric correlation are 
identical and equation (7) is valid for both types of coefficients. The question 
arises as to whether this equation is an estimator of an empirically determined 
tetrachoric. In other words, if the 7,; on the right side of the equation (7) 
are tetrachorics, is the left side of this equation a good estimate of the cor- 
responding tetrachoric obtained by direct computation? 

The theoretical answer to this question would be in the affirmative 
if the implicit assumptions were reasonably fulfilled. Some of these assump- 
tions appear, however, to be quite drastically violated when one is dealing 
with relationships between test items. Rather extensive checks on equation 
(7) have been made using tetrachorics on the type of data one is likely to 
encounter in the area of testing and scaling problems. These empirical checks 
(12) indicate that observed tetrachorics computed from the basic data 
are of the same order of magnitude as those obtained by using tetrachorics 
in equation (7). Further, the two sets of data correlate approximately .90. 
Widely varving difficulties as well as diverse item-types were included in 
these investigations. All of the tetrachorics were computed from dichotomies 
as near the median as the distributions of the variables would permit; some 
of these dichotomies, however, had to be as extreme as 15-85. Considering 
the fact that a shift in the point of dichotomization will change the correlation 
for individual fourfolds, the estimate given by equation (7) is remarkably 
good. It would seem, therefore, that the violations of the assumptions implicit 
in equation (7) are not great enough to offset obtaining unbiased estimates 
of tetrachorics by entering tetrachorics in equation (7), at least for the data 
that one is likely to obtain when working in this area. 

Let us now turn to a closely related aspect of this problem. In the course 


of the development, we set 


ox: /o. = 2, 2. Vey a Nk « 








ROBERT J. WHERRY AND BEN J. WINER 175 


Because the tetrachoric correlations will in general be larger than the cor- 
responding product-moment correlations obtained from items which are 
dichotomously scored or items which have only a few categories, the ratio 
of the variances, which estimates the sum of the product-moment correlations, 
will tend to underestimate the sum of the corresponding tetrachorics. What 
one needs, therefore, as an estimate of the sum of inter-item tetrachorics 
is the ratio of the variance of subtest to item under the assumption of contin- 
uous, bivariate normally distributed items. In order to arrive at this estimate, 
let us first see if we can obtain an estimate of ryxctet) froM Tixipm) , Where 
it is assumed that the latter variables have been grouped into broad cate- 
gories but are basically continuous, bivariate normal. 

Let us consider two cases. Under Case 1 we will assume that the items 
have been dichotomously scored. Under Case 2 the assumption will be made 
that the number of scoring categories for each item is three or more. In 
both cases we make the assumption that all pairs of items have bivariate 
normal distributions. 


Case 1 
Under the assumption that the point of dichotomization of both variables 
is at the median, the tetrachoric correlation coefficient is given by 
ad — te) 
] 


Titet) = sin (2" N? 


where a, b, c, d refer to cell frequencies in a fourfold. But under the assumption 
of median splits on both variables we have 








ry ad — bc oa ad — be 
Vat deta atob+d V(N/2)* 
ic 4(ad — be) © 
N? 
Thus, we have 
Toe) = sin (2 ts) = sin (: rs). (32) 


Equation (32) yields a good approximation of r;,..) from 7, when the points 
of dichotomization are close to the median. (Empirical checks indicate that 
the approximation is good even if the dichotomies are as extreme as 30-70.) 

The ratios, r¢ret)/Te = Gs , as computed from equation (32) are given 
in Table 2. In this table it will be noted that changes in G, are relatively 
small for rather large changes in r, . Hence within restricted ranges of r, , 
without introducing appreciable error, G, can be considered to be constant. 








176 PSYCHOMETRIKA 











TABLE 2 
Values of “2 = Gs 
"¢ 
"¢ Gs 
05 1.56 
.10 1.56 
.15 1.56 
.20 1.54 
.30 1.51 
.40 1.47 
.50 1.41 
. 60 1.35 
.70 1.27 
.80 1.19 
.90 1.09 





To return to the problem of estimating a sum of tetrachorics from a 
sum of phis, under the model used to derive equation (32) we have 
om Titet) = r Gore - 
The sum that we are interested in estimating is obtained from the inter- 
correlation of variables that are considered to belong to a group in the multiple- 
group centroid method of factoring. Hence, the range of inter-correlations 
within this sum is not likely to be large (i.e., not more than 20 correlation 


points). One is reasonably sure, therefore, that a good estimate of the sum 
over inter-item tetrachorics within subtests (assuming the model) could be 


obtained from 
2 Titet) = GG zs ey (33) 


where Gj is the ratio associated with 7, . This latter value can be estimated 
by calculating sample values of the inter-item correlations. 

It should be noted that a check on the appropriateness of the G3 value 
can be obtained in the process of estimating 7, by also computing the cor- 
responding tetrachorics. To test the empirical validity of equation (33) 
when the underlying assumptions in the model are not completely met, 
inter-item tetrachorics as well as phi coefficients were computed for a series 
of eight subtests. These subtests were made up of attitude-type items in a 
leader-behavior description questionnaire (12). The subtests each had approxi- 
mately 15 items and 7, for the subtests ranged from .2U to .40. The ratios 
of the sums of the tetrachorics to the corresponding sum of the phis ranged 
from 1.82 to 1.31. Many of the items in these subtests could be dichotomized 
only outside of the range 30-70; yet the agreement between empirically and 
theoretically determined Gz was above r = .90. It would seem that the 
underlying model is applicable even though some of the assumptions are 
far from being met. 








ROBERT J. WHERRY AND BEN J. WINER 177 


It should be noted that this adjustment involves substituting 


(Gon: (az — x-)] for (ax — nx-) in those formulas in which the latter ex- 
pression appears. Thus equation (11) becomes 
| = Gix:(ax: —= Nk’). (34) 


One must also replace ax. by its adjusted value, V 8. + Nr. 


Case 2 

For this case we assume that the items have underlying bivariate normal 
distributions, but have been grouped into as many categories as there are 
scoring categories. Under restrictions to be indicated, an estimate of the 
product-moment correlation that would be obtained if the data were not 
grouped into broad categories is given by 


, 
Tey 


9 
, , 
Vor'T yy’ 


Fary 
where the primes indicate the continuous variables. If the terms in the 
denominator are considered to be independent of that in the numerator, 
this estimate cannot obviously hold for high values of r,, . For values of 
r,, in the range from .00 to .40 the error introduced by assuming this inde- 
pendence is not appreciable. Assuming that each of the categories of the 
grouped frequencies is represented by the mean of the category, the cor- 
relation between grouped and ungrouped normally distributed data can be 


estimated by (see 6, 395) 


If we assume that all items have the same number of scoring categories, 
and that the categories are comparable, r,,, = r,,- . Although r,,,, is not 
actually a tetrachoric correlation, it is probably of the same order of magni- 
tude as a tetrachoric or a slight underestimate thereof. 

The theoretical value of r,,, can be readily estimated. The value of 
ry: can then be computed. The ratios of estimates of the product-moment 
correlations based upon continuous variates to those based upon categorized 
variates are as follows: 


No. of effective 





categories Porys/ Pag 
Continuous 1.00 
8 1.03 
1.12 


5 

4 1.19 
3 1.36 
2 1.57 











178 PSYCHOMETRIKA 


It is important to realize that the number of effective categories differs 
from the number of categories that one may set up for scoring purposes; 
unless the category has the proportion of frequency corresponding to the 
category of which it is supposed to be the counterpart in the normal distri- 
bution, the category cannot be considered “effective.”’ It will be noted that 
for two effective categories, within the range .00 to .30, the value of G, is 
quite close to r,.,-/r,, . Indications are that this latter ratio is a slight under- 
estimate of the counter-part of G, for three or more categories—the larger 
the number of categories the larger the degree of underestimation. 

It is recommended that in all ranges of r,, , the ratio, Titet)/Tam) , be 
estimated directly from samples of the data. Where the discrepancy between 
the empirically determined ratio and the theoretical estimate differ appre- 
ciably, the sampling should be large before making a final estimate. Having 
arrived at an adjustment factor, G, the adjustments indicated in the last 
paragraph under Case I must be made. 


Summary of Operations 


1. Test items are divided into what are judged to be relatively inde- 
pendent subtests by experts. Not all items need be assigned to subtests— 
only those items on which experts agree should be included in the subtests. 
The number of subtests must be equal to, or greater than, the number of 
factors that would be extracted from the matrix of inter-item correlations. 

2. All persons are scored on all subtests. 

3. The intercorrelations between subtest scores, rg, are computed. 

Note: Steps 4 and 6 through 10 can be omitted for subtests having 
ten or more items. 

4. The standard deviation of each subtest is computed. The average 
standard deviation of the items in each of the subtests is computed or esti- 
mated. 

5. First approximations to the between-group correlations, rx, , are 
obtained from the between-subtest correlations, rg, , by use of equation (25). 
As a step towards obtaining a linearly independent set of subtests, the 
Z portion to the Wherry test selection procedure (8) can be used, starting 
with the subtest having the lowest average intercorrelation with all other 
subtests and progressively adding subtests whose variances show least 
overlap with selected tests, i.e., those having the largest Z values. (A rough 
computational procedure to follow is to eliminate subtests whose unpredicted 
variance becomes less than .20. This procedure is equivalent to assuming 
that approximately 20% of the total subtest variance is error due to a com- 
bination of unreliability and sampling.) 

6. The item-subtest correlations, 7;x» , are computed for all items 
against all subtests retained in step 5. These correlations should be tetra- 
chorics. 

7. The item-subtest correlations for items in subtests are converted 








ROBERT J. WHERRY AND BEN J. WINER 179 


to projections on group centroid axes, 7;x , by means of equations (18), 
(19), (33) or its equivalent, and iterative use of equation (13a). 

8. The rx, are now converted to better estimates of rx, by means 
of equation (24). 

9. Step 5 is repeated using values of rx, computed from step 8 in place 
of the rx, called for in step 5. 

10. For the subtests selected in step 9, the r;x- for items not in these 
subtests are converted to r;x by means of equation (21). Note: If item 7 
belongs to subtest P’ but does not belong to subtest Q’, rig, is subject to 
this conversion. All items not included in any of the subtests are also subject 


to this conversion. 

11. From the intercorrelation matrix (matrix of rg_) of the selected 
group factors, a transformation matrix is computed by means of the extended 
Doolittle procedure outlined. Multiplying the matrix of item-group centroid 
projections by the transformation matrix yields an estimate of orthogonal 
factor loadings, 7.x . 

12. The r;x« are rotated to meaningfulness by any of the regular rotation 
methods. 

REFERENCES 
1. Dunlap, J. W., and Cureton, E. E. On the analysis of causation. J. educ. Psychol. 
1930, 21, 657-680. 

. Gleser, G., Loevinger, J., and DuBois, P. Resolution of a pool of items into relatively 

homogeneous subtests. Amer. Psychologist, 1951, 6, 401. (Abstract.) 

3. Gordon, L. V. A comparison of the validities of the forced-choice and questionnaire 
methods in personality measurement. Unpublished Ph.D. dissertation, Ohio State 
University, 1950. 

4. Loevinger, J., Gleser, G. C., DuBois, P. H., and Berkeley, M. H. A new method for 
constructing multiple score tests. Amer. Psychologist, 1951, 6, 303-304. (Abstract.) 

5. Lucas, C. M. An emergent category approach to the study of adolescent needs. Unpub- 
lisged Ph.D. dissertation, Ohio State University, 1951. 

6. Peters, C. C., and VanVoorhis, W. R. Statistical procedures and their mathematical 
bases. New York: McGraw-Hill Book Co., 1940. 

7. Phelps, H. R. A factor analysis of adolescent informal group activities and attitudes. 
Unpublished Ph.D. dissertation, Ohio State University, 1951. 

8. Stead, W. H., and Shartle, C. L. Occupational counseling techniques. New York: 
American Book Co., 1940, Appendix V, 245-250. 

9. Wherry, R. J. Factor analysis of Officer Qualification Form Q.C.L. Form QCL 2b. 
Research Foundation, The Ohio State University, Report to Department of the Army, 
Feb. 28, 1950. 

10. Wherry, R. J., and Gaylord, R. H. The concept of test and item reliability in relation: 
to factor pattern. Psychometrika, 1948, 8, 247-269. 

11. Wherry, R. J., Perloff, R., and Campbell, J. T. An empirical verification of the Wherry- 
Gaylord iterative factor analysis procedure. Psychometrika, 1951, 16, 67-74. 

12. Winer, B. J. Iterative factor analysis: its psychological and mathematical bases. 
Unpublished Ph.D. dissertation, Ohio State University, 1952. 


i) 


Manuscript received 9/22/52 


Revised manuscript received 11/11/52 














BOOK REVIEWS 


VerRNON, Puiip. The Structure of Human Abilities. Methuen’s Manuals of Modern Psy- 
chology. London: Methuen & Co. Ltd.; New York: John Wiley & Sons, Inc., pp. 160. 


Regardless of your factor-analytic faith, regardless of your disposition to worship g, 
tolerate it, ignore it, or deny it, you will be pleased with this clear, non-technical exposition 
of factor analysis. Professor Vernon writes, “I assume only that the reader has had an ele- 
mentary course in psychology and knows what an intelligence test and a correlation 
coefficient are.’ He succeeds in holding to this assumption and yet is able to cover with 
admirable lucidity the fundamental concepts of factor analysis, the problems and limitations 
that it is currently facing, and the several conflicting theories. 

This is a fine general description. Someone who has been working with factor analysis 
a long time will find the book goes over much familiar ground. In spite of this, such a person 
will find it very worth while, especially in demonstrating how the familiar concepts may be 
verbalized. The book is very appropriate for students as an introduction to the subject of 
factor analysis and for workers in other areas of testing and of psychology who want to 
know more about all the argument. 

The dedication reads, ‘to C. Burt and G. H. Thomson (with whom I almost always 
agree) and to L. L. Thurstone and the late C. E. Spearman (with whom I usually disagree).”’ 
Vernon indicates at all points his preference for the extraction of g and a few major group 
factors. Nevertheless his discussion of the general concepts of factor analysis is entirely 
suitable for devotees of all methods, and his comparison of the diverse methods, although 
definitely one-sided, is by far the clearest brief explanation of the situation that is available. 

The first three chapters explain the general theory of factor analysis and its limitations, 
its historical development, the differences among the several methods, and the author’s 
preferred group-factor method and its special implications. Considerable attention is paid 
to the hierarchical group-factor theory of the structure of abilities. The position is taken 
that g heads the hierarchy, that the major group factors verbal:educational (v:ed) and 
spatial:mechanical (k:m) are on a second level, that the minor group factors are at a third 
level, and that specific factors branch down from there. Vernon admits that the hierarchy 
is not perfect; for example, scientific ability cuts across the two major group factors. He 
points out that factors at any level can be obtained by means of an appropriate selection 
of tests. The reviewer feels that to emphasize this hierarchy as a way of visualizing the 
structure of abilities will prove to be misleading, since any test can be placed at any level 
in the hierarchy simply by properly selecting the other tests in the battery. The usefulness 
of factor analysis rests, to a large extent, on the existence of natural clusters of correlated 
tests reflecting certain unities of function. The most parsimonious hypothesis regarding the 
structuring of these clusters or factors is that they be regarded as variously extended and 
sized and variously overlapping behavior syndromes with perhaps a tendency for the cog- 
nitive kind to overlap a common area—g. 

The first three chapters beautifully set forth the reasons for using the factor-analytic 
method, its uses, and its limitations. These are described from the viewpoint of the up-to- 
date worker in the field and cover all the important considerations that should be borne in 
mind while using the method. Most of these are considerations that would be equally well 
agreed to by users of all factor-analytic methods. Here, for example, are discussions of 
faculties vs. factors, factor analysis as an empirical approach to human abilities, identifica- 
tion of factors, limitations of factor analysis, broad and narrow group factors, effects of 
range on factor patterns, the effect of age on factor patterns, and many other problems. 


181 








182 PSYCHOMETRIKA 


Some two-thirds of the book is devoted to chapters discussing the factorial findings in 
the various areas of testing. These chapters present the results of both British and American 
studies in a discursive rather than tabular manner. On the whole they give a very fair picture 
of the findings, including a very fair picture of the confusions. The more technically oriented 
reader may have occasion to find some fault with these chapters for several reasons: 
(1) the results of the various factorial studies are discussed without indication as to method 
of analysis, (2) the results are discussed by reference to test names with no further indication 
as to what the tests are like, and (3) coverage of studies is only moderate, some areas such 
as personality being ignored completely. 

A 7-page appendix compares the general-plus-group-factor theories with the multiple- 
factor theories. Although devotees of Thurstone’s method will not agree with the import of 
that appendix, they will find it a remarkably precise statement of the theoretical differences. 
Most of the appendix is devoted to seven reasons for the superiority of general-plus-group- 
factors. These are given below, each condensed into one sentence. (1) In all but highly 
selected populations g is too big to belittle. (Vernon indicates, however, that sometimes we 
are interested in selecting populations and that multiple-factor analysis can reveal group 
factors in such situations that the general-plus-group-factor method might obscure.) 
(2) g and the major group factors are more nearly invariant than are multiple factors with 
changing populations and changing tests. (3) Group factoring is quicker than multiple 
factoring. (4) The ‘‘primary”’ factors are so divisible that it is difficult to see where factor- 
ization is to stop, except by stopping with the smallest factors that are useful, presumably 
either practically or conceptually. (5) Since no test measures a single factor and the g or 
other content must be removed by a suppressor test, why not admit that all tests involve g, 
instead of artificially removing it by means of rotation? (6) Hierarchy is not merely a sta- 
tistical artifact; it is best understood in terms of general-plus-group-factor theory. (7) The 
multiple-factor theories encourage factor naming and the false belief that tests will predict 
success on jobs having activities apparently similar to those involved in the factor, while a 
short battery of the major group factors, v:ed and k:m, will serve almost all predictive 
purposes. 

It is not within the scope of this review to present the opposing view on each of these 
points. Perhaps the opposition would have most to say on the subject of invariance. They 
would maintain that the particular g extracted from a given battery depends upon the 
particular tests included, and that the group factors are also dependant on the particular 
tests included until rotation finds the natural clusters in several areas where some consis- 
tency in the reactions of the subjects results in concomitant variation of test scores. 

In conclusion it should be emphasized that the value of a book does not depend on 
the extent to which the reader agrees with its contents. Although the reviewer prefers a 
school of thought widely different from that of the author, the book, particularly Chapters 
1-3 and the appendix, was found by him to be remarkably stimulating and a great clarifier 
of a muddled situation. 


Educational Testing Service John W. French 


Apxins, Dorotuy C., anD LyERLY, SamueEt B. Factor Analysis of Reasoning Tests. Chapel 
Hill: Univ. North Carolina Press, 1952, pp. iv + 122. $2.00. 


Until very recently, the realm of reasoning abilities was probably the least adequately 
explored of the recognized cognitive functions. Neither in the definitive studies of Thurstone 
nor in the comprehensive factorial investigations of the Army Air Forces Psychology Pro- 
gram (Guilford and associates) appear consistent or satisfactory determinations of reasoning 
factors. The authors of this book, recognizing the need for intensive investigation of the 











BOOK REVIEWS 183 


reasoning domain, report a project designed “‘to clarify the underlying nature of the abilities 
affecting performance in types of tests that have been identified previously or suggested as 
measures of reasoning”’ (p. 4). 

The report is presented in two sections. In Part I is described a factor analysis of 
38 tests, selected on the basis of their factorial content, from the battery included in the 
Army Air Forces Psychology Program, Report No. 5, Printed Classification Tests. The 
correlations analyzed were those reported in the Air Force study. On the basis of the analy- 
sis, 18 tests were chosen for inclusion in the 66-variable study reported in Part II. The other 
tests administered for this second, major analysis were chosen from a variety of sources, 
and some were developed specifically for this study. In addition to the 65 tests finally 
selected, the number of years of formal schooling was included as a variable. Subjects were 
200 enlisted men, selected by performance on an Army classification aptitude battery to be 
representative of the population of enlisted men in the Army. Following the normalization 
of all variables, product-moment correlation coefficients were computed (IBM equipment 
was used) and a centroid factor analysis was performed. 

Where the aim of a factor analysis project is to discover consistent, meaningful con- 
stellations of ability from the interrelations among a group of variables, the editorial selec- 
tion of variables is obviously of crucial importance. It is noteworthy that the authors 
agreed ‘‘to devote a sizeable portion of the available resources to deciding upon, selecting 
or constructing, and editing the tests to be used” (p. 4). The literature was systematically 
reviewed for test ideas. Individual psychologists and philosophers were invited to submit 
test ideas and hypotheses as to the nature of reasoning. There is presented evidence of 
careful selection of tests which cover a wide range of reasoning tasks. In addition, tests 
measuring non-reasoning abilities were included—at least two such tests for each of nine 
previously identified factors, e.g., Verbal Relations, Number, Space factors, Closure factors, 
etc. 

To prevent the ambiguity of interpretation which arises when a factor is defined by 
tests alike in type of mental operation but also alike in medium of presentation, tests con- 
sidered to be of similar function were chosen from differing media of presentation. One of a 
number of examples of the perspicacity of this approach appears in the interpretation of one 
of the reasoning factors, named ‘‘Perception of Abstract Similarities.” The factor is defined 
by two verbal classification tests, two figures classification tests, both verbal and figure 
analogies tests, and a test of analogies of meaningful pictures. It is apparent that the process 
underlying successful performance on the tests transcends the medium in which they are 
presented, at least within the limitations of a group-administered paper and pencil test 
battery. 

Sixteen centroid factors were extracted and rotated into oblique simple structure. 
For thirteen of these, interpretations are offered. Four reasoning factors are presented. In 
addition to the factor ‘“‘Perception of Abstract Similarities,’ reasoning factors have been 
named ‘‘Hypothesis Verification’’ (best defined by the series of Raven’s Progressive Matrices 
tests), “Deduction,” the ability to draw correct inferences (best defined by False Premises 
and Identical Forms tests), and “Concept Formation” (best defined by tests demanding 
that the subject assign to a group of objects pictured or named the name of the narrowest 
category which subsumes all objects). Also suggested to be allied to reasoning is ‘‘Flexibility 
of Perceptual Closure,’’ Thurstone’s second closure factor, one of the nine reference factors 
in the analysis. 

The interpretations which appear in the book, in general, are convincing. They depend 
not only upon the characteristics of tests exhibiting high factor loadings, but also upon the 
nature of tests not exhibiting high factor loadings—it is often of critical importance to 
discover ‘Why not?” There is apparently a tacit recognition of the provisional character 
necessarily imposed, by inherent limitations of factor analysis, upon interpretations of 
rotated factors. The frequent references to earlier studies are of considerable aid to the reader 








184 PSYCHOMETRIKA 


in establishing similarities between factors here reported and those identified in previous 
investigations of mental abilities. Differences, too, are reported, particularly with respect 
to the Air Force studies. There is discovered no correspondence between the characteristics 
of the several reasoning factors discussed in the Air Force Report No. 5 and those of the 
reasoning factors isolated here. To the reviewer the interpretations of the present study 
seem to provide a more satisfactory picture of reasoning abilities and the interpretations 
make good sense, psychologically. However, further investigation of the discrepancies 
certainly is warranted. 

In most respects, this book is extremely comprehensive. Each test is described suc- 
cinctly in terms of content, time limits, scoring formula, etc., and both raw score and nor- 
malized score frequency distributions are exhibited. Complete tables of test intercorrela- 
tions and of 16th-factor residuals are presented, in addition to tables of centroid and oblique 
factor loadings, the transformation matrix, and the matrix of cosines of reference vectors. 
Useful information which might have been presented, but is not, includes the distribution 
of number of items completed on each test (from which it would be possible to obtain an 
estimate of the level of chance performance) and graphical representation of pairs of reason- 
ing factors (to supply pictorial guidance for the assessment of interrelations among these 
factors). 

This work provides considerable advance toward the goal of organizing our knowledge 
of reasoning abilities. The study supplies a framework of hypotheses, the confirmation or 
revision of which might be expected to lead directly to stable primary abilities of reasoning. 
In addition to serving as a guide valuable to both theoreticians and practitioners interested 
in the measurement of intellective functions, the study serves as an example of one of the 
most fruitful applications of factor analysis methods. 


Cniversity of Chicago Lyle V. Jones 


Lioyp A. Jerrress (Ed.), Cerebral Mechanisms in Behavior, The Hixon Symposium. 
New York: John Wiley, 1951, pp. xiv + 311, $6.50. 


This book contains the papers given during the Hixon Symposium at the California 
Institute of Technology in September, 1948. Following each paper is an edited transcript 
of the discussion. 

The first paper, by John von Neumann, is “The General and Logical Theory of 
Automata.”’? Dr. von Neumann runs through the similarities and some of the critical dif- 
ferences between artificial and natural automata, between computing machines and the 
central nervous system. He concludes that the inferiority of our materials and the absence 
of any adequate theory prevents us from attaining the high degree of complication and the 
small dimensions of natural automata. The McCulloch-Pitts theory, built on the present 
system of formal logic, is inadequate. A new logic is needed whose procedures allow a low 
but non-zero probability of errors. Turing’s results are extended to a theory for self- 
reproducing automata. The paper was received by the other participants with skeptical 
remarks like the following: 


McCulloch, “I envy Dr. von Neumann the fact that the machines with which he has to 
cope are those for which he has, from the beginning, a blueprint of what the machine is 
supposed to do and how it is supposed to do it.” 


Gerard, “I have had the privilege of hearing Dr. von Neumann speak on various occasions, 
and I always find myself in the delightful but difficult role of hanging on to the tail of a 
kite.”’ 








BOOK REVIEWS 185 


Weiss, “I question whether a mechanism in which all these innumerable contingencies have 
been foreseen, and the corresponding corrective measures built in, is actually conceivable.” 
, Pp £ ’ ~ 


Lashley, ‘‘It seems to me the question of precision of the organic machine has been somewhat 
exaggerated.”’ 


Halstead, ‘I suspect that von Neumann biases his automata towards rationality by careful 
regulation of the energies of the substrate.” 


Lorente de No, “Possibly the automaton can be made to maintain memory, but the auto- 
maton that does would not have the properties of our nervous system.” 

Warren 8. McCulloch presented the second paper, ‘‘Why the Mind is in the Head.” 
He asserts that the nervous system is par excellence a logical machine. It is a highly redun- 
dant machine because information handling capacity is sacrificed for dependability. The 
notion of negative feedback is considered to be neurophysiologically important. Finally, 
McCulloch reviews in some detail the neural circuits he has proposed for form perception. 
This paper evoked such remarks as: 


Lorente de N6, ““Dr. McCulloch has brought what we know of both the anatomy and the 
physiology of the brain closer to an integrated whole than it has ever been before.” 


von Neumann, “I see the plausibility of what you say, but I still have a residue of uncer- 
tainty left.’ 

Gerard, “If these networks of neurons are organized so beautifully in the striate, then how 
do you account for some of Dr. Lashley’s critical experiments on destruction of different 
parts of the brain?” 


Kohler, ‘‘T admire the courage with which Dr. McCulloch tries to relate his neurophysiology 
to facts in psychology. But I sometimes feel like criticizing the results.” 


Lashley, “‘I am very much in sympathy with the type of development represented in the 
last two papers. At the present time, however, such a formulation involves a great over- 
simplification of the problems.” 

The third paper, by Lorente de Né, had to be omitted. The fourth, “The Problem of 
Serial Order in Behavior,” was given by K. S. Lashley. Lashley argues that the temporal 
organization of behavior has never been properly considered. The notion of chains of 
associated reflexes is not adequate. A variety of examples, most of them linguistic, lead 
Lashley to consider a “priming” mechanism that gets responses ready before they occur. 
Temporal order is probably closely related to spatial order. The other participants com- 
mented: 

Kliiver, ‘‘In my opinion, this is the first time since 1914 that a neurological thinker has 
presented such a trenchant analysis of the role of the time factor in behavior.” 


Halstead, ‘‘T have been greatly impressed with the case that Dr. Lashley has made for non- 
specific, non-mosaic representation.” 


Gerard, “I find it impossible to think through or even towards the complexities of behavior 
’ 
if restricted to atomic units travelling along atomic fibers.” 


Lorente de No, ‘‘While I was listening there was going through my head a mental picture of 
a number of experiments that I intend to perform—suggested to me by Dr. Lashley’s 
speech.” 


Weiss, ‘“The great value of Dr. Lashley’s presentation lies in the fact that it places rigorous 
limitations upon the free flight of our fancy in designing models of the nervous system.” 








186 PSYCHOMETRIKA 


The fifth paper, by Heinrich Kliiver, was ‘Functional Differences between the 
Occipital and Temporal Lobes.”’ Kliiver reviews his work on the occipital and temporal 
lobes. Removing the occipital lobe causes a monkey to behave as though his eye were a 
simple photocell which records only changes in light flux. Removing the temporal lobes 
does not produce much sensory effect but causes remarkable change in behavior. Kliiver 
then calls attention to extracerebral mechanisms that exert obscure influences on the brain 
and illustrates them by his own work on the role of porphyrins in the central nervous system. 
Sample comments were: 


McCulloch, ‘“‘“Each time we get one of these problems in which we are concerned on the one 
side with chemistry, and on the other side with the structure of the nervous system, we get 
into difficulties which take us years and years to solve.” 


Gerard, “I am particularly grateful to Dr. Kliiver for, in a sense, putting the brain back in 
the body.” 


Kéhler, ‘I have perhaps missed the connection between the two parts of Dr. Kliiver’s 
paper.” 

Wolfgang Kohler read the sixth paper, “Relational Determination in Perception.” 
He begins with a review of his experiments on figural after-effects and argues that they 
should be interpreted in terms of direct currents flowing through the brain tissue. This 
argument led to experiments searching for such direct currents. Some reactions to this 
paper were: 


Lashley, “‘I am at a loss to see where further development of the theory will lead.” 


Lorente de No, “From looking at your records I don’t see any reason why they are not 
perfectly legitimate records and why we are not now in the presence of a new phenomenon 
in physiology.” 


Gerard, ‘It is somewhat to the shame of physiologists that the spontaneous rhythm of the 
human brain was discovered by a psychiatrist—the Berger rhythm. Now, again, it is not 
a physiologist, but a psychologist who has had the courage to try a reasonable gamble and 
look for his still slower changes directly in the human brain. I am much inclined to think 
that he has found them.” 


Liddell, “How do you propose to follow this clue of the slowly fluctuating cortical potentials 
when you change over to the kinesthetic and tactile fields?”’ 

The seventh paper, “Brain and Intelligence,’”’ was given by Ward C. Halstead. His 
paper follows along many of the ideas of his book Brain and Intelligence and treats the 
effects of lesions on intelligence, the factors in biological intelligence, the role of the frontal 
lobes, etc. His paper evoked such comments as: 

Lashley, “I think this is the most promising method of approach to the whole problem of 
cerebral localization that has been made.” 

Nielsen, ‘“Dr. Halstead is the only psychologist that I have ever heard of who can tell by 
his psychological tests that the frontal lobes have been taken off.” 

Kliiver, ““Dr. Halstead’s intensive analysis has thrown new light on the functional signifi- 
cance of the frontal lobes.” 

Lindsley, ‘I am sure that the stimulation of Dr. Halstead’s work will direct a number of 


psychologists into this kind of application.” 
The volume closes with a review of the symposium from the viewpoint of a clinician, 








BOOK REVIEWS 187 


Henry W. Brosin. He says the great strength of the group is their willingness to tolerate 
partial answers, proposes that color responses on the Rorschach test should be of especial 
interest to neurophysiologists, and wonders if psychology will not find its ‘‘great man” in 
the person who can combine the concepts of Freud with the ideals of Wundt. These remarks 
by Dr. Brosin were extemporaneous. 

If this review gives a somewhat confused picture of the book, then it correct! sum- 
marizes the reviewer’s impression. The papers are uniformly good and will be useful to give 
graduate students an introduction to the thinking of these famous scientists. The discussion 
is heterogeneous, sometimes inaccurate, seldom documented, and usually disorganized. 
The possibility of a general theory of behavior based on cerebral mechanisms looks faintly 
hopeful at first, but deteriorates as the symposium progresses. At least one reader closed 
the book with the impression that the study of behavior has much more to contribute to 
our knowledge of cerebral mechanisms than vice versa. 


Massachusetts Institute of Technology G. A. Miller 


NorMAN FREDERIKSEN AND W. B. ScurapeEr, Adjustment to College. Princeton: Educa- 
tional Testing Service, 1951, pp. XVII + 504. 


Soon after the veterans began to pour into our colleges and universities at the end of 
World War II, educators started to deliver opinions and research workers analyses of data 
about veterans’ adjustment to college. The large number and complexity of the factors 
involved cast doubts on both the opinions and the analyses of data. Opinions were too 
vulnerable to the effects of sentimental and financial considerations. Virtually all of the 
reported studies failed to control one or more of such relevant factors as year in class, pre- 
dicted academic performance (e.g., high-school rank and college aptitude score), division of 
the college in which the student was enrolled, and the specifics associated with one institu- 
tion as compared to others. 

It is fortunate, then, that this study of a well-planned sample of sixteen colleges and 
universities was made possible through the financial assistance of the Carnegie Corporation 
and the consultative resources of the Educational Testing Service, of which the authors are 
staff members. Here we have a definitive answer based upon a sophisticated analysis of the 
question. 

Not only was academic achievement, through the medium of grades, investigated but 
a questionnaire was administered dealing with facts of personal history and status, atti- 
tudes toward college and college grades, worries and anxieties, use of time, and factors 
bearing on the importance of the ‘‘GI Bill” in determining college attendance. The ques- 
tionnaire was administered in the fall of 1946 and a sample of approximately 11,000 dis- 
tributed through the sixteen institutions was drawn. 

Through an application of covariance analysis which permitted the use of an index 
representing the variance in grades unaccounted for by measures of high-school success and 
aptitude and achievement, ability was ruled out as a factor in the comparisons of veterans 
and non-veterans. These two groups were compared within institution, further subdivided 
by sex, class, and division. They find that the hypothesis that veterans excel non-veterans 
of equal ability is supported. For freshmen, however, this tendency is small. Even in the 
most extreme instances (groups), the advantage of the veterans would on the average 
amount to no more than the difference between C and C+. 

There is a wealth of information in the analyses of the questionnaire responses, but 
for the most part no spectacular differences between veterans and non-veterans in motiva- 
tional adjustment are revealed. Veterans’ worries, if anything, are fewer than the non- 








188 PSYCHOMETRIKA 


veterans’, though somewhat differently distributed. The veterans were more concerned 
about financial problems and concentration, while non-veterans were more concerned about 
feelings of inferiority and social adjustment. Of special interest from the point of view of 
national educational policy are findings related to non-aptitude determiners of going to 
college. The veterans were drawn from families of less educational background and lower 
income than their non-veteran counterparts. At the same time students who are older and 
from lower socio-economic groups tend to be overachievers. Specificity and certainty of 
vocational choice were other outstanding factors in overachievement. 

The meticulous interpretation of data is marred by one instance. The lack of correla- 
tion between date of testing and test scores and grade achievement is taken as evidence 
that ‘‘the time of taking the test has little effect on the predictive value of the test’’ (p. 59). 
This lack of correlation with date of testing does not preclude the possibility that the correla- 
tion of test scores taken a year earlier with grades will be lower than the correlation of 
test scores taken at the start of the current year. However, this fault is a minor one in an 
otherwise-well planned, thoroughly analyzed study. 


University of Michigan Edward S. Bordin 


BOOKS RECEIVED 


AsuBy, W. Ross. Design for a Brain. New York: John Wiley and Sons, Inc., 1952, pp 
ix + 260. 

Bartow, Frep. Mental Prodigies. New York: Philosophical Library, 1952, pp. 256. 

BERRIEN, F. K. Practical Psychology (Revised Edition). New York: The MacMillan Co., 
1952, pp. xv + 640. 

Bureau of Psychology, U.P., Allahabad. An Educational Guidance Project. Allahabad: 
Bureau of Psychology, Uttar Pradesh, 1952, ii + 82. 

CuessER, Eustace. Cruelty to Children. New York: Philosophical Library, 1952, pp. 159. 

Cocuran, Wiiur1aAM G. Sampling Techniques. New York: John Wiley and Sons, Inc., 1953, 
xiv + 330. 

Coomss, CiypE H. A Theory of Psychological Scaling. Engineering Research Bulletin No. 34. 
Ann Arbor: Univ. Michigan Press, 1952, pp. vi + 94. 

Demine, W. E. Some Theory of Sampling. New York: John Wiley and Sons, Inc., 1950, 
pp. xvii + 602. 

EpucaTIoNaL TestinGc Service. A Summary of Statistics on the Selective Service College 
Qualification Test. Princeton: Educational Testing Service, 1952, pp. 71. 

GouLDEN, Crrit H. Methods of Statistical Analysis. New York: John Wiley and Sons, Inc., 
1952, pp. vi + 440. 

HynpMAN, OLAN R., M.D. The Origin of Life and the Evolution of Living Things. New York: 
Philosophical Library, 1952, pp. xxi + 648. 

Karpr, Fay B. The Psychology and Psychotherapy of Otto Rank. New York: Philosophical 
Library, 1953, ix + 129. 

Lanois, Paut H., anp SToNE, Carouu. The Relationship of Parental Authority Patterns to 
Teenage Adjustments. Bulletin No. 538. Washington Agricultural Experiment Stations. 
Pullman, Wash.: State College of Washington, 1952, pp. 31. 

Marer, Norman R. F. Principles of Human Relations: Applications to Management. New 
York: John Wiley and Sons, Inc., 1952, pp. ix + 474. 

Pautmer, Haroitp, M.D. The Philosophy of Psychiatry. New York: Philosophical Library, 
1952, pp. ix + 70. 

Popotsky, Epwarp (Editor). Encyclopedia of Aberrations. New York: Philosophical 
Library, 1953, viii + 550. 








BOOKS RECEIVED 189 


Rao, C. RADHAKRISHNA. Advanced Statistical Methods in Biometric Research. New York: 
John Wiley and Sons, Inc., 1952, pp. xvii + 390. 

RriEsE, WALTHER. The Conception of Disease: Its History, its Versions and its Nature. 
New York: Philosophical Library, 1953. 120. 

Rosack, A. A. A History of American Psychology. New York: Library Publishers, 1952, 
pp. xiv + 426. 

Tippett, LL. H. C. The Methods of Statistics. New York: John Wiley and Sons, Inc., 1952, 
pp. 365. 

TRAXLER, ARTHUR E.; Jacoss, RoBERT; SELOVER, MARGARET; AND TOWNSEND, AGATHA. 
Introduction to Testing and the Use of Test Results in Public Schools. New York: 
Harper Brothers, 1953, x + 118. 

WivpER, Raymonp L. Introduction to the Foundations of Mathematics. New York: John 
Wiley and Sons, Inc., 1952, pp. xiv + 305. 

Wo riz, Dae (chairman), et al. Improving Undergraduate Instruction in Psychology. 
New York: The Macmillan Co., 1952, pp. vi + 58. 














